Question
In this exercise using RStudio, generate simulated data, and will then use this data to perform best subset selection. Use the following code to generate
In this exercise using RStudio, generate simulated data, and will then use this data to perform best subset selection.
- Use the following code to generate a predictor X of length n = 100, as well as a noise vector ? of length n=100.
set.seed(1)
x <- rnorm(100)
eps <- rnorm(100)
2. Generate a response vector Y of length n=100 according to the model:
Y=?0+?1X1+?2X2+?3X3+?
Where ?0, ?1, ?2, ?3 are constants of your choice.
Sample code (replace the b0, b1, b2, b3 values of your choice):
b0 <- 2
b1 <- 3
b2 <- -1
b3 <- 0.5
y <- b0 + b1 * x + b2 * x^2 + b3 * x^3 + eps
3. Use the regsubsets() function to perform best subset selection in order to choose the best model containing the predictors X,X2,?,X10. What is the best model obtained according to Cp, BIC, and adjusted R2? Show some plots to provide evidence for your answer, and report the coefficients of the best model obtained. Note you will need to use the data.frame() function to create a single data set containing both X and Y (sample code is provided below).
install.packages("leaps")
library(leaps)
data.full <- data.frame(y = y, x = x)
regfit.full <- regsubsets(y ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5) + I(x^6) + I(x^7) + I(x^8) + I(x^9) + I(x^10), data = data.full, nvmax = 10)
reg.summary <- summary(regfit.full)
par(mfrow = c(2, 2))
plot(reg.summary$cp, xlab = "Number of variables", ylab = "C_p", type = "l")
points(which.min(reg.summary$cp), reg.summary$cp[which.min(reg.summary$cp)], col = "red", cex = 2, pch = 20)
plot(reg.summary$bic, xlab = "Number of variables", ylab = "BIC", type = "l")
points(which.min(reg.summary$bic), reg.summary$bic[which.min(reg.summary$bic)], col = "red", cex = 2, pch = 20)
plot(reg.summary$adjr2, xlab = "Number of variables", ylab = "Adjusted R^2", type = "l")
points(which.max(reg.summary$adjr2), reg.summary$adjr2[which.max(reg.summary$adjr2)], col = "red", cex = 2, pch = 20)
coef(regfit.full, which.max(reg.summary$adjr2))
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started