Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please note: Though I have mentioned all the Data Mining steps below, you are only required to answer Step 4 : Data Modeling and Step

Please note:
Though I have mentioned all the Data Mining steps below, you are only required
to answer Step 4: Data Modeling and Step 5: Data Evaluation.
I prefer that you use Python for the project. Excel is fine too.
Where to Submit: Blackboard
What to submit?
1.(35 points) Model (Step 4 of the Datamining Process). Look for more detailed
instructions below in Step 4
2.(15 points) Model Evaluation (Step 5 of the Datamining Process). Look for more
detailed instructions below in Step 5
Step 1: Business Understanding/Problem:
Is there any indicator they predict the average SAT score of school and can we use the
learnings to help schools improve their students SAT score?
Set 2: Data Understanding:
You are provided with AP and SAT data. In the CSV file AP_SAT_Data.csv, there are
3 independent variables/attributes
1. No_AP_TestTakers
2. Total_Exam_Taken
3. No_Exam_Passed
1 Dependent variable/ Target
1. SAT_Math_Score
Step 3: Data Preparation:
For this Project, I took care of it. I performed the following exercise
1. Exclude categorical attributes that are difficult to transform to Numeric ones
2. Add dummy values for some missing one
3. Deleted rows which more than few missing attributes values
Step 4: Modeling (Linear Regression)
Use Excel or Python (preferred) to perform modeling. Use the AP_SAT_Data.csv file
create the models.
Model 1: Use the following 2 independent variables to build a predictive model for the
target variable SAT_Math_Score
1. No_AP_TestTakers
2. Total_Exam_Taken
Model 2: Use the all 3 independent variables to build a predictive model for the target
variable SAT_Math_Score
Submit the following:
1. The work (10 points for each model. Total 20 points)
a. If you used Excel, submit the regression output in excel for both the
models
b. If you used Python, submit the Jupyter notebook. The code has to run.
2.(10 points) What is the regression equation for both these models?
3.(5 points) Compare the 2 models you created. Which one is the better model
based on MSE? Provide the MSE.
Set 5: Evaluation:
I have also provided testing.csv. Use the data in this file to evaluate your models.
Submit the following:
(15 points) Test your two model from Step 4 using the test data. Which one is the better
model now (use MSE)? Has your answer changed from Step 4.2?
Set 6: Deployment:
Think through how you would use the model findings.
Are there important ethical considerations? Nothing to submit.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Successful Keyword Searching Initiating Research On Popular Topics Using Electronic Databases

Authors: Randall MacDonald, Susan MacDonald

1st Edition

0313306761, 978-0313306761

More Books

Students also viewed these Databases questions