Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

4 Under - Parameterization and Over - Parameterization In the previous section, we had more data points than features in our data, i . e

4 Under-Parameterization and Over-Parameterization
In the previous section, we had more data points than features in our data, i.e., we were looking at N>100. This tends to be the ideal situation, since we need to find an unknown weight for each feature, and this gives us enough information to determine each weight (similar to how two data points are enough to find the slope and intercept, the two unknowns, of a line).
Sometimes, however, we may have fewer data points than we have features - this makes it difficult to determines how the underlying model should depend on each feature. We just don't have enough data. In the following problems, consider a training data set of size N=50 and a test data set of size N=50.
Problem 8: Let A be a matrix of random values, with k rows and 101 columns, where each entry sampled from a N(0,1) distribution. Note that for any input vector x,Ax? will be a vector of k values. We could then consider performing linear regression on the data points (Ax,y) rather than (x,y). Note that if k50, this transformed data set will have fewer input features than we have data points in our data set, and thus we restore linear regression to working order.
Plot over k from 1 to 50 the testing error when, for a given k, you pick a random A to transform the input vectors by, then do linear regression on the result. You'll need to repeat the experiment for a number of A, for each k, to get a good plot. What do you notice? Does this seem to be a reasonable trend?
Problem 9: Notice that there's nothing stopping us from continuing to increase k. This puts us in a region over over-parameterization (we have more features in our data than data points), and in fact increasingly over-parameterization, if we were bold enough to take k>100. One possible solution is to, when performing linear regression on the transformed Ax? data, do ridge regression, introducing the ridge penalty into the loss we are minimizing.
Continue the experiment, for k=50,51,52,dots,200, plotting the resulting testing error (averaged over multiple choices of A). How did you choose a good value? (Note that the number of weights we need to find changes with k- should this influence ?) What do you notice?
Bonus: Why does this happen?
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions