15. It is possible to define a regularizer to minimize ????e(errorh (e) + * regularizerh) rather...

Question:

15. It is possible to define a regularizer to minimize ????e(errorh

(e) + λ * regularizerh) rather than Formula 7.4. How is this different than the existing regularizer? [Hint: Think about how this affects multiple data sets or for cross validation.]

Suppose λ is set by k-fold cross validation, and then the model is learned for the whole data set. How would the algorithm be different for the original way(s) of defining a regularizer and this alternative way? [Hint: There is a different number of examples used for the regularization than there is the full data set; does this matter?] Which works better in practice?

Fantastic news! We've Found the answer you've been seeking!