Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Q4. In the homework folder you would have access to a data file named PhysicsLaw. csv. This is a data file with 12 features V1,
Q4. In the homework folder you would have access to a data file named PhysicsLaw. csv. This is a data file with 12 features V1, V2, ..., V10, V11, V12 and a response column Y. The data correspond to the measurements of a series of physical quantities (features) in relation to a sensor observation (response). Read the data file and split it into two sets. Set 1 includes the first 500 rows of the data (do not count the row associated with the feature/response names), and set 2 includes the rows 501 to 500 of the data. Name the first set train and the second set test. (for a better accuracy we pick a large test set) (a) Run a linear regression with Y as the response and V1, V2, ..., V12 as the features. Are all the features in good health with regards to the p-value?. Calculate the test error through 1000 1 E = 1000 C ypred test i - Ytesti i=1 (b) We decide to find the most important features. For this purpose we use the forward selection approach discussed in the class. We begin with p = 12 simple linear regressions, each with only one of the features and pick the one with the lowest test error E, as calculated above. We then add to that model the variable that results in the lowest value of E for the new two-variable model. Continue this process until you realize that the best E value does not improve (or increases) from a model of size m to m + 1. Which variables were picked using this approach? Are they the same as the ones with small p-value in part (a)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started