Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Cross - Validation Lab and Programming Assignment This Lab and programming Assignment begins with the Boston Data set, in which we apply Linear, Ridge and
CrossValidation Lab and Programming Assignment
This Lab and programming Assignment begins with the Boston Data set, in which we apply Linear, Ridge and Lasso Regression to the data, eventually using CrossValidation.
The Boston Data Set tells us the values of houses in different suburbs of Boston. It contains observations and features columns The last columnfeature is the dependent variable y Thus as a result we have independent variables x
set.seed
libraryMASS # Boston Data Set
libraryglmnet # Lasso and Ridge Regression
n nrowBoston # number of rows n
p ncolBoston # number of columns p
The sample function allows us to randomly select numbers from the vectors n without replacement, using the indices created, thus we can create the training and testing set. For the test set we specify the row number as the indices. For the train set we specify the row number as all the rows excluding for the indices. add a minus sign that tells R that you would like to exclude the rows in the test.index factor
test.index samplen
test Bostontestindex, # test set
train Bostontest.index, # train set
We will now do a linear regression, lasso regression and ridge regression.
For lasso, the first argument is the Xvariables in the train set. The second argument is Yvariable in the train set. Remember, your last column is the y variable or the dependent variable, which is the medium value of houses in different suburbs in Boston. The rest of the columns in the Boston Data set is your Xvariable. Note that for the glmnet function, your xvariable must be a matrix, hence the asmatrix function to convert the train set from a data frame to a matrix.
We set alpha for lasso and if you want to run a ridge regression you set alpha Family argument is set to gaussian because we are doing a regression. For a classification problem, the family should be classified as binomial or multinomial.
Lambda is set as for both lasso and ridge. However, lambda value is user defined thus you can try whatever value you want, as long as it is a positive real number.
linear lmmedv ~ data train # all featuresindependent variables included here.
lasso glmnetasmatrixtrain:p trainp alpha family "gaussian", lambda
ridge glmnetasmatrixtrain:p trainp alpha family "gaussian", lambda
We can now use the predict function to do predictions. Simply input the model name that you have just saved and the the xvariables in your test set. Our xvariables are the first columns of the test set.
For the glmnet model, the xvariables need to be input as a matrix.
pred.linear predictlinear test:p
pred.lasso predictlasso asmatrixtest:p
pred.ridge predictridge asmatrixtest:p
After you have done your predictions, go ahead and calculate the Sum of Squared Residuals SSR
Question
Calculate the SSRs for the Linear, Lasso and Ridge Regression. Use the output SSRlinear, SSRlasso and SSRridge and the sum function to calculate the SSRs between testp and pred.linearpredlassopredictridge
# YOUR CODE HERE
# SSRlinear
# SSRlasso
# SSRridge
# your code here
Which model performs the best? Use the cat function to display the output from all three models. Comment on your results.
catSSRlinear, SSRlasso, SSRridge
Question
We can use CrossValidation method to determine a good value for lambda. Using the cvglmnet function, carry out the cross validation for lasso and ridge to determine the minimum lambda value. Use the outputs lam.lasso and lam.ridge. For the input values into the functions, it will be the same input used to carry out lasso and ridge with the glmnet function, ie for lam.lasso, along with the asmatrix function, your input should be asmatrixtrain:p trainp alpha family "gaussian"$lambda.min and the same for lam.ridge, except this time your alpha value is going to be zero.
The great advantage of the cvglment function is that you don't need to input a range of values for lambda. The function can automatically locate a range of lambda values to test for you. Then you should save the best lambda value as $lambda.min. Then you do the lasso and ridge regression with the entire train set and the best lambda value.
# YOUR CODE HERE
# lam.lasso
# lam.ridge
# your code here
Question
Using the lambda values from lam.lasso and lam.ridgr recalculate the lasso and ridge values. Again use the outputs lasso and ridge, the glmnet function and the asmatrix function.
# YOUR CODE HERE
# lasso
# ridge
# your code here
Question
Recalculate pred.linear, pred.lasso and pred.ridge using the same outputs.
# YOUR CODE HERE
# pred.linear
# pred.lasso
# pred.ridge
# your code he
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started