Question: It is possible to define a regularizer to minimize e(loss(Y(e),Y(e)) + regularizer(Y)) rather than formula (7.5) (page 303). How is this different than
It is possible to define a regularizer to minimize ∑e(loss(Y(e),Y(e)) +
λ ∗ regularizer(Y)) rather than formula (7.5) (page 303). How is this different than the existing regularizer? [Hint: Think about how this affects multiple datasets or for cross validation.]
Suppose λ is set by k-fold cross validation, and then the model is learned for the whole dataset. How would the algorithm be different for the original way(s) of defining a regularizer and this alternative way? [Hint: There is a different number of examples used for the regularization than there is the full dataset; does this matter?] Which works better in practice?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
