Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Linear Model Selection and Regularization This programming assignment will use the Tidy Models platform. It will take a look at regularization models and hyperparameter tuning.

Linear Model Selection and Regularization
This programming assignment will use the Tidy Models platform. It will take a look at regularization models and hyperparameter tuning. These models contain a regularization term. This assignment will use parsnip for model fitting and recipes and workflows to perform the transformations, and tune and dials to tune the hyperparameters of the model.
You will be using the Hitters data set from the ISLR package. You wish to predict the baseball players Salary based on several different characteristics which are included in the data set.
Since you wish to predict Salary, then you need to remove any missing data from that column. Otherwise, you won't be able to run the models.
Set output as Hitters
library(tidymodels)
library(ISLR2)
# Your code here
# Hitters <-
Hitters <- ISLR2::Hitters
Hitters <- Hitters %>% drop_na(Salary)
# your code here
Attaching packages tidymodels 1.0.0
broom 1.0.4 recipes 1.0.5
dials 1.1.0 rsample 1.1.1
dplyr 1.1.0 tibble 3.2.0
ggplot23.4.1 tidyr 1.3.0
infer 1.0.4 tune 1.0.1
modeldata 1.1.0 workflows 1.1.3
parsnip 1.0.4 workflowsets 1.0.0
purrr 1.0.1 yardstick 1.1.0
Conflicts tidymodels_conflicts()
purrr::discard() masks scales::discard()
dplyr::filter() masks stats::filter()
dplyr::lag() masks stats::lag()
recipes::step() masks stats::step()
Use suppressPackageStartupMessages() to eliminate package startup messages
# Hidden Tests
You will use the glmnet package to perform ridge regression. parsnip does not have a dedicated function to create a ridge regression model specification. You need to use linear_reg() and set mixture =0 to specify a ridge model. The mixture argument specifies the amount of different types of regularization, mixture =0 specifies only ridge regularization and mixture =1 specifies only lasso regularization.
Setting mixture to a value between 0 and 1 lets us use both. When using the glmnet engine you also need to set a penalty to be able to fit the model. You will set this value to 0 for now, it is not the best value, but you will look at how to select the best value in a little bit.
ridge_spec <- linear_reg(mixture =0, penalty =0)%>%
set_mode("regression")%>%
set_engine("glmnet")
Once the specification is created you can fit it to you data. You will use all the predictors. Use the fit function here.
ridge_fit <- fit(ridge_spec, Salary ~ ., data = Hitters)
The glmnet package will fit the model for all values of penalty at once, so you can now see see what the parameter estimate for the model is now that you have penalty =0. You can use the tidy function to accomplish this specific task.
tidy(ridge_fit)
Loading required package: Matrix
Attaching package: Matrix
The following objects are masked from package:tidyr:
expand, pack, unpack
Loaded glmnet 4.1-6
A tibble: 20\times 3
term estimate penalty
(Intercept)8.112693e+010
AtBat -6.815959e-010
Hits 2.772312e+000
HmRun -1.365680e+000
Runs 1.014826e+000
RBI 7.130224e-010
Walks 3.378558e+000
Years -9.066800e+000
CAtBat -1.199478e-030
CHits 1.361029e-010
CHmRun 6.979958e-010
CRuns 2.958896e-010
CRBI 2.570711e-010
CWalks -2.789666e-010
LeagueN 5.321272e+010
DivisionW -1.228345e+020
PutOuts 2.638876e-010
Assists 1.698796e-010
Errors -3.685645e+000
NewLeagueN -1.810510e+010
Let us instead see what the estimates would be if the penalty was 11498. Store your output to tidy2. What do you notice?
# Your code here
# tidy2<-
tidy2<- tidy(
linear_reg(penalty =11498, mixture =0)%>%
set_mode("regression")%>%
set_engine("glmnet")%>%
fit(Salary ~ ., data = Hitters)
)
# Print the parameter estimates for penalty =11498
tidy2
# your code here
A tibble: 20\times 3
term estimate penalty
(Intercept)407.20593677411498
AtBat 0.03700308311498
Hits 0.13835755211498
HmRun 0.52519550811498
Runs 0.23097829011498
RBI 0.24011477511498
Walks 0.28997155511498
Years 1.10883239911498
CAtBat 0.00313521511498
CHits 0.01166668411498
CHmRun 0.08764278911498
CRuns 0.02340625811498
CRBI 0.02416572311498
CWalks 0.02504211711498
LeagueN 0.08662923411498
DivisionW -6.22543133211498
PutOuts 0.01650659611498
Assists 0.00261633511498
Errors -0.02056415811498
NewLeagueN 0.30292289911498
# Hidden Tests
Look below at the parameter estimates for penalty =705. Store your output to tidy3. Once again, use the tidy function to accomplish this task.
# Your code here
# tidy3<-
tidy3<- tidy(
linear_reg(penalty =705, mixture =0)%>%
set_mode("regression")%>%
set_engine("glmnet")%>%
fit(Salary ~ ., data = Hitters)
)
# Print the parameter estimates for penalty =705
tidy3
# your code here
A tibble: 20\times 3
term estimat

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions