Answered step by step
Verified Expert Solution
Question
1 Approved Answer
# A Consider data given in CSV file HW6DataA and the following data description: Table 1: Data Description 50 marks Field StdID Description Student ID
# A Consider data given in CSV file HW6DataA and the following data description: Table 1: Data Description 50 marks Field StdID Description Student ID (index) Statistical background | Whether the student has a background in statistics. Python background Gender Class level Weekly studying hours Previous exams Absences Class size The student background in python (Excellent, Good, Fair) The student gender (Male or Female) The student class level (Freshman, Sophomore, Junior, Senior) Average number of hours student studies per week. Number of previous exams solved. Number of absences throughout the semester Number of students in the class. Mid Midterm score Project score Project score Final Final score (output variable) Note: Solve all the above questions using Python. Use Pandas, Seaborn, Sklearn, etc. libraries for all the above analysis Do the following tasks using data given in HW6DataA and Table-1: A-1: Regression. Given a regression problem along with the input columns and output column, describe the steps to build a regression model. Explain how the regression model can be used for predicting the output column values. A-2: Regularization. Discuss in detail the potential use of both Ridge and LASSO regression? How are they different from the OLS regression? A-3: Cross-Validation. In both Ridge and LASSO regression, which technique do we use to select the best value for a? A-4: Given Data. Read and display the data given in HW6DataA. Refer to Table-1 for the data description. A-5: OLS Regression. Build an OLS regression model for predicting the Final score of each student. Consider the following: All the variables except StdID, Gender, and Final shall be considered as input variables. Train the model using 70% of the data and use the rest for testing. Set random state to 42. A-6: LASSO and Ridge. Using the same training data from OLS model (task A-5), estimate the coefficients (betas) using LASSO and Ridge regression. Obtain the best value of a among {103, 102, 101, 10,10, 102, 103} using 10 fold cross validation. Compare and comment on the coefficients of the three models. Compare the performance of the OLS model against LASSO and Ridge models on the testing data. A-7: SISO Regression. Using the closed form method (formula), build a SISO regression model to predict the Finale score. Use the variable with the highest regression coefficient obtained by LASSO as input variable (say, top variable). Using the corresponding testing data, compare the performance of SISO model (top variable vs Final score) with that of LASSO reported in A-6. Also, depict top variable vs Final score
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started