Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You are a superintendent of a school district in Massachusetts and have been asked to analyze the following dataset, MASchools . The dataset contains data

You are a superintendent of a school district in Massachusetts and have been asked to analyze the following dataset, MASchools. The dataset contains data on test performance, school characteristics, and student demographic backgrounds for over 200 school districts in Massachusetts. Any of the variables that were just mentioned would be appropriate to analyze a schools performance. You would like to run a kNN Regression analysis to determine which characteristics of schools and student bodies would lead to a good prediction of the dependent variable of your choosing.
Note: Professor Giunta would find it highly suspicious given the randomization of choice with the question.
Import the dataset, MASchools, as well as the appropriate libraries and any other additional files needed to answer the question.
What type of data must the independent variables be? What type of data must the dependent variables be?
When viewing the dataset, you will need to make a data cleaning decision. Please remove the NA observations and clean the data in any way necessary to answer your question.
Analyze the data structure and make appropriate manipulations. If you decide to remove any columns, comment why you are removing those columns. The removal of these columns must be appropriate for the given statistical test.
Create two data sets, the first data set will transform the independent variables into z-scores (standardization) and the other data set will transform the values using a range method (normalization).
Appropriately partition your data using a seed of 12.
Determine an appropriate size of k using knnCrossVal for both data sets.
What value did you decide was appropriate for k for each dataset, and why did you make this decision?
Now, run the model.
What is the MAPE and RMSE of your model (both sets)?
Compare your results and explain which method you feel is more appropriate to help you predict your dependent variable of choosing.
Upload your R file with the code for this question.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Oracle RMAN For Absolute Beginners

Authors: Darl Kuhn

1st Edition

1484207637, 9781484207635

Students also viewed these Databases questions