Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Cross - Validation Lab and Programming Assignment This Lab and programming Assignment begins with the Boston Data set, in which we apply Linear, Ridge and

Cross-Validation Lab and Programming Assignment
This Lab and programming Assignment begins with the Boston Data set, in which we apply Linear, Ridge and Lasso Regression to the data, eventually using Cross-Validation.
The Boston Data Set tells us the values of houses in different suburbs of Boston. It contains 506 observations and 14 features (columns). The last column/feature is the dependent variable y. Thus as a result we have 13 independent variables x.
set.seed(2021)
library(MASS) # Boston Data Set
library(glmnet) # Lasso and Ridge Regression
n <- nrow(Boston) # number of rows (n =506)
p <- ncol(Boston) # number of columns (p =14)
The sample function allows us to randomly select 100 numbers from the vectors 1......n, without replacement, using the indices created, thus we can create the training and testing set. For the test set we specify the row number as the indices. For the train set we specify the row number as all the rows excluding for the indices. (add a minus sign that tells R that you would like to exclude the rows in the test.index factor).
test.index <- sample(n,100)
test <- Boston[test.index,] # test set
train <- Boston[-test.index,] # train set
We will now do a linear regression, lasso regression and ridge regression.
For lasso, the first argument is the X-variables in the train set. The second argument is Y-variable in the train set. Remember, your last column is the y variable or the dependent variable, which is the medium value of houses in different suburbs in Boston. The rest of the 13 columns in the Boston Data set is your X-variable. Note that for the glmnet function, your x-variable must be a matrix, hence the as.matrix function to convert the train set from a data frame to a matrix.
We set alpha =1 for lasso and if you want to run a ridge regression you set alpha =0. Family argument is set to gaussian because we are doing a regression. For a classification problem, the family should be classified as binomial or multinomial.
Lambda is set as 0.2 for both lasso and ridge. However, lambda value is user defined thus you can try whatever value you want, as long as it is a positive real number.
linear <- lm(medv ~ ., data = train) # all 13 features/independent variables included here.
lasso <- glmnet(as.matrix(train[,1:(p-1)]), train[,p], alpha =1, family = "gaussian", lambda =0.2)
ridge <- glmnet(as.matrix(train[,1:(p-1)]), train[,p], alpha =0, family = "gaussian", lambda =0.2)
We can now use the predict function to do predictions. Simply input the model name that you have just saved and the the x-variables in your test set. Our x-variables are the first 13 columns of the test set.
For the glmnet model, the x-variables need to be input as a matrix.
pred.linear = predict(linear, test[,1:(p-1)])
pred.lasso = predict(lasso, as.matrix(test[,1:(p-1)]))
pred.ridge = predict(ridge, as.matrix(test[,1:(p-1)]))
After you have done your predictions, go ahead and calculate the Sum of Squared Residuals (SSR).
Question 1
Calculate the SSR's for the Linear, Lasso and Ridge Regression. Use the output SSR.linear, SSR.lasso and SSR.ridge and the sum function to calculate the SSR's between [test,[,p] and pred.linear/pred.lasso/predict.ridge).
# YOUR CODE HERE
# SSR.linear <-
# SSR.lasso <-
# SSR.ridge <-
# your code here
Which model performs the best? Use the cat function to display the output from all three models. Comment on your results.
cat(SSR.linear, SSR.lasso, SSR.ridge)
Question 2
We can use Cross-Validation method to determine a good value for lambda. Using the cv.glmnet function, carry out the cross validation for lasso and ridge to determine the minimum lambda value. Use the outputs lam.lasso and lam.ridge. For the input values into the functions, it will be the same input used to carry out lasso and ridge with the glmnet function, i.e, for lam.lasso, along with the as.matrix function, your input should be (as.matrix(train[,1:(p-1)]), train[,p], alpha =1, family = "gaussian")$lambda.min and the same for lam.ridge, except this time your alpha value is going to be zero.
The great advantage of the cv.glment function is that you don't need to input a range of values for lambda. The function can automatically locate a range of lambda values to test for you. Then you should save the best lambda value as $lambda.min. Then you do the lasso and ridge regression with the entire train set and the best lambda value.
# YOUR CODE HERE
# lam.lasso <-
# lam.ridge <-
# your code here
Question 3
Using the lambda values from lam.lasso and lam.ridgr recalculate the lasso and ridge values. Again use the outputs lasso and ridge, the glmnet function and the as.matrix function.
# YOUR CODE HERE
# lasso <-
# ridge <-
# your code here
Question 4
Recalculate pred.linear, pred.lasso and pred.ridge using the same outputs.
# YOUR CODE HERE
# pred.linear <-
# pred.lasso <-
# pred.ridge <-
# your code he

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke

4th Edition

0136086535, 9780136086536

More Books

Students also viewed these Databases questions