Answered step by step
Verified Expert Solution
Question
1 Approved Answer
You are a superintendent of a school district in Massachusetts and have been asked to analyze the following dataset, MASchools . The dataset contains data
You are a superintendent of a school district in Massachusetts and have been asked to analyze the following dataset, MASchools The dataset contains data on test performance, school characteristics, and student demographic backgrounds for over school districts in Massachusetts. Any of the variables that were just mentioned would be appropriate to analyze a schools performance. You would like to run a kNN Regression analysis to determine which characteristics of schools and student bodies would lead to a good prediction of the dependent variable of your choosing.
Note: Professor Giunta would find it highly suspicious given the randomization of choice with the question.
Import the dataset, MASchools, as well as the appropriate libraries and any other additional files needed to answer the question.
What type of data must the independent variables be What type of data must the dependent variables be
When viewing the dataset, you will need to make a data cleaning decision. Please remove the NA observations and clean the data in any way necessary to answer your question.
Analyze the data structure and make appropriate manipulations. If you decide to remove any columns, comment why you are removing those columns. The removal of these columns must be appropriate for the given statistical test.
Create two data sets, the first data set will transform the independent variables into zscores standardization and the other data set will transform the values using a range method normalization
Appropriately partition your data using a seed of
Determine an appropriate size of k using knnCrossVal for both data sets.
What value did you decide was appropriate for k for each dataset, and why did you make this decision?
Now, run the model.
What is the MAPE and RMSE of your model both sets
Compare your results and explain which method you feel is more appropriate to help you predict your dependent variable of choosing.
Upload your R file with the code for this question.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started