Cross Validation Lab and Programming Assignment This Lab and programming Assignment begins with the Boston Data set, in which we apply Linear, Ridge and Lasso Regression to the data, eventually using Cross Validation The Boston Data Set tells us the values of houses in different suburbs of Boston It contains 5 0 6 observations and 1 4 features ( columns ) The last column feature is the dependent variable y Thus as a result we have 1 3 independent variables x set seed ( 2 0 2 1 ) library ( MASS ) Boston Data Set library ( glmnet ) Lasso and Ridge Regression n nrow ( Boston ) number of rows ( n 5 0 6 ) p ncol ( Boston ) number of columns ( p 1 4 ) The sample function allows us to randomly select 1 0 0 numbers from the vectors 1 n , without replacement, using the indices created, thus we can create the training and testing set For the test set we specify the row number as the indices For the train set we specify the row number as all the rows excluding for the indices ( add a minus sign that tells R that you would like to exclude the rows in the test index factor ) test index sample ( n , 1 0 0 ) test Boston test index, test set train Boston test index, train set We will now do a linear regression, lasso regression and ridge regression For lasso, the first argument is the X variables in the train set The second argument is Y variable in the train set Remember, your last column is the y variable or the dependent variable, which is the medium value of houses in different suburbs in Boston The rest of the 1 3 columns in the Boston Data set is your X variable Note that for the glmnet function, your x variable must be a matrix, hence the as matrix function to convert the train set from a data frame to a matrix We set alpha 1 for lasso and if you want to run a ridge regression you set alpha 0 Family argument is set to gaussian because we are doing a regression For a classification problem, the family should be classified as binomial or multinomial Lambda is set as 0 2 for both lasso and ridge However, lambda value is user defined thus you can try whatever value you want, as long as it is a positive real number linear lm ( medv , data train ) all 1 3 features independent variables included here lasso glmnet ( as matrix ( train , 1 ( p 1 ) ) , train , p , alpha 1 , family gaussian , lambda 0 2 ) ridge glmnet ( as matrix ( train , 1 ( p 1 ) ) , train , p , alpha 0 , family gaussian , lambda 0 2 ) We can now use the predict function to do predictions Simply input the model name that you have just saved and the the x variables in your test set Our x variables are the first 1 3 columns of the test set For the glmnet model, the x variables need to be input as a matrix pred linear predict ( linear , test , 1 ( p 1 ) ) pred lasso predict ( lasso , as matrix ( test , 1 ( p 1 ) ) ) pred ridge predict ( ridge , as matrix ( test , 1 ( p 1 ) ) ) After you have done your predictions, go ahead and calculate the Sum of Squared Residuals ( SSR ) Question 1 Calculate the SSR ' s for the Linear, Lasso and Ridge Regression Use the output SSR linear, SSR lasso and SSR ridge and the sum function to calculate the SSR ' s between test , , p and pred linear pred lasso predict ridge ) YOUR CODE HERE SSR linear SSR lasso SSR ridge your code here Which model performs the best Use the cat function to display the output from all three models Comment on your results cat ( SSR linear, SSR lasso, SSR ridge ) Question 2 We can use Cross Validation method to determine a good value for lambda Using the cv glmnet function, carry out the cross validation for lasso and ridge to determine the minimum lambda value Use the outputs lam lasso and lam ridge For the input values into the functions, it will be the same input used to carry out lasso and ridge with the glmnet function, i e , for lam lasso, along with the as matrix function, your input should be ( as matrix ( train , 1 ( p 1 ) ) , train , p , alpha 1 , family gaussian ) $lambda min and the same for lam ridge, except this time your alpha value is going to be zero The great advantage of the cv glment function is that you don't need to input a range of values for lambda The function can automatically locate a range of lambda values to test for you Then you should save the best lambda value as $lambda min Then you do the lasso and ridge regression with the entire train set and the best lambda value YOUR CODE HERE lam lasso lam ridge your code here Question 3 Using the lambda values from lam lasso and lam ridgr recalculate the lasso and ridge values Again use the outputs lasso and ridge, the glmnet function and the as matrix function YOUR CODE HERE lasso ridge your code here Question 4 Recalculate pred linear, pred lasso and pred ridge using the same outputs YOUR CODE HERE pred linear pred lasso pred ridge your code he

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 01, 2024

Cross - Validation Lab and Programming Assignment This Lab and programming Assignment begins with the Boston Data set, in which we apply Linear, Ridge and

Cross

-

Validation Lab and Programming Assignment

This Lab and programming Assignment begins with the Boston Data set, in which we apply Linear, Ridge and Lasso Regression to the data, eventually using Cross

-

Validation.

The Boston Data Set tells us the values of houses in different suburbs of Boston. It contains

506

observations and

14

features

(

columns

) .

The last column

/

feature is the dependent variable y

.

Thus as a result we have

13

independent variables x

.

set.seed

(2021)

library

(

MASS

)

# Boston Data Set

library

(

glmnet

)

# Lasso and Ridge Regression

< -

nrow

(

Boston

)

# number of rows

(

= 506)

< -

ncol

(

Boston

)

# number of columns

(

= 14)

The sample function allows us to randomly select

100

numbers from the vectors

1 . . . . . .

,

without replacement, using the indices created, thus we can create the training and testing set. For the test set we specify the row number as the indices. For the train set we specify the row number as all the rows excluding for the indices.

(

add a minus sign that tells R that you would like to exclude the rows in the test.index factor

) .

test.index

< -

sample

(

, 100)

test

< -

Boston

[

test

.

index,

]

# test set

train

< -

Boston

[-

test.index,

]

# train set

We will now do a linear regression, lasso regression and ridge regression.

For lasso, the first argument is the X

-

variables in the train set. The second argument is Y

-

variable in the train set. Remember, your last column is the y variable or the dependent variable, which is the medium value of houses in different suburbs in Boston. The rest of the

13

columns in the Boston Data set is your X

-

variable. Note that for the glmnet function, your x

-

variable must be a matrix, hence the as

.

matrix function to convert the train set from a data frame to a matrix.

We set alpha

= 1

for lasso and if you want to run a ridge regression you set alpha

= 0 .

Family argument is set to gaussian because we are doing a regression. For a classification problem, the family should be classified as binomial or multinomial.

Lambda is set as

0.2

for both lasso and ridge. However, lambda value is user defined thus you can try whatever value you want, as long as it is a positive real number.

linear

< -

(

medv ~

.,

data

=

train

)

# all

13

features

/

independent variables included here.

lasso

< -

glmnet

(

.

matrix

(

train

[, 1

(

- 1)]),

train

[,

],

alpha

= 1,

family

=

"gaussian", lambda

= 0.2)

ridge

< -

glmnet

(

.

matrix

(

train

[, 1

(

- 1)]),

train

[,

],

alpha

= 0,

family

=

"gaussian", lambda

= 0.2)

We can now use the predict function to do predictions. Simply input the model name that you have just saved and the the x

-

variables in your test set. Our x

-

variables are the first

13

columns of the test set.

For the glmnet model, the x

-

variables need to be input as a matrix.

pred.linear

=

predict

(

linear

,

test

[, 1

(

- 1)])

pred.lasso

=

predict

(

lasso

,

.

matrix

(

test

[, 1

(

- 1)]))

pred.ridge

=

predict

(

ridge

,

.

matrix

(

test

[, 1

(

- 1)]))

After you have done your predictions, go ahead and calculate the Sum of Squared Residuals

(

SSR

) .

Question

1

Calculate the SSR

'

s for the Linear, Lasso and Ridge Regression. Use the output SSR

.

linear, SSR

.

lasso and SSR

.

ridge and the sum function to calculate the SSR

'

s between

[

test

, [,

]

and pred.linear

/

pred

.

lasso

/

predict

.

ridge

) .

# YOUR CODE HERE

# SSR

.

linear

< -

# SSR

.

lasso

< -

# SSR

.

ridge

< -

# your code here

Which model performs the best? Use the cat function to display the output from all three models. Comment on your results.

cat

(

SSR

.

linear, SSR

.

lasso, SSR

.

ridge

)

Question

2

We can use Cross

-

Validation method to determine a good value for lambda. Using the cv

.

glmnet function, carry out the cross validation for lasso and ridge to determine the minimum lambda value. Use the outputs lam.lasso and lam.ridge. For the input values into the functions, it will be the same input used to carry out lasso and ridge with the glmnet function, i

.

,

for lam.lasso, along with the as

.

matrix function, your input should be

(

.

matrix

(

train

[, 1

(

- 1)]),

train

[,

],

alpha

= 1,

family

=

"gaussian"

)

$lambda.min and the same for lam.ridge, except this time your alpha value is going to be zero.

The great advantage of the cv

.

glment function is that you don't need to input a range of values for lambda. The function can automatically locate a range of lambda values to test for you. Then you should save the best lambda value as $lambda.min. Then you do the lasso and ridge regression with the entire train set and the best lambda value.

# YOUR CODE HERE

# lam.lasso

< -

# lam.ridge

< -

# your code here

Question

3

Using the lambda values from lam.lasso and lam.ridgr recalculate the lasso and ridge values. Again use the outputs lasso and ridge, the glmnet function and the as

.

matrix function.

# YOUR CODE HERE

# lasso

< -

# ridge

< -

# your code here

Question

4

Recalculate pred.linear, pred.lasso and pred.ridge using the same outputs.

# YOUR CODE HERE

# pred.linear

< -

# pred.lasso

< -

# pred.ridge

< -

# your code he

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke

4th Edition

★★★★★

What is a bona fide occupational qualification? Provide at least two specific examples.

Answered: 1 week ago

Previous Question Next Question