Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

3 Regularization: Ridge and Lasso Frequently, we are interested in finding 'simpler' models over more complex ones. This is essentially Occam's Razor ( simpler models

3 Regularization: Ridge and Lasso
Frequently, we are interested in finding 'simpler' models over more complex ones. This is essentially Occam's Razor (simpler models better models), but complexity can mean different things. Two possible examples of this in regression:
When there are multiple sets of weights that produce the same output, we generally prefer models with smaller weights, so as not to amplify any noise in the data, and to not depend too heavily on any single component of the data. Simpler smaller weights.
When a model has weights that are close to zero - it may indicate that the components those weights are scaling are negligible, possibly random noise, that should be excluded. We could consider 'pruning' those components or features, or pushing those weights all the way to zero. Simpler weights that are too small are deleted / set to zero.
To achieve both of these goals, we utilize what's known as regularization. In this case, we augment the loss function so that we penalize models that are more complex, and reward models that are more simple.
Ridge Regression: In this case, we take the loss function to be
This loss will be smaller when not only is the loss on the data set small, but the weights are also small. Models with smaller weights will be preferred over models with the same error but larger weights. The constant >0 determines exactly how strong the pressure toward small weights will be.
Lasso Regression: In this case, we take the loss function to be
By adding this penalty term, weights are pressured to be smaller - but weights that would be large without it are pressured to be slightly smaller, and weights that would be small without it are pressured all the way to zero. The threshold for this behavior is set by - the larger is, the more weights are pushed all the way to zero. We'll talk about this more in class and recitation, but it serves to automatically 'prune' features or components of x? that are below a certain significance.
Regularization Bonus: Why is it worth keeping w0 out of the regularization penalty terms?
3
Problem 6: For N=300 training data, and N=200 testing data, consider fitting a model by minimizing the ridge regression loss, for various values of >0. Plot the testing loss (without the ridge penalty) as a function of . Are you able to improve generalization of your model? What range of seems useful?
Problem 7: For N=300 training data, N=200 testing data, consider fitting a model by minimizing the lasso regression loss, for various values of >0. Show that as goes up, the number of non-zero terms in the fitted model go down. What features or components of the data are most persistent (surviving across a range of )? How does that compare with what we know about how y actually depends on x??
For this problem, you should generate at least these two plots:
As a function of >0, plot the number of non-zero weights that survive after training.
For each i=0,1,2,3,dots,100, plot the where that weight goes to 0, killing that feature.
Bonus: Note that the lasso regression term does not have a well defined gradient everywhere, because of the sharp point of the absolute value function. How does PyTorch handle this when calculating gradients? Do some research.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

What are the five Cs credit? Why are they important?

Answered: 1 week ago