Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Open the Excel workbook. There are several tabs in this workbook. The first two tabs contain historical user - song ratings, randomly partitioned into training

Open the Excel workbook. There are several tabs in this workbook. The first two tabs contain historical user-song ratings, randomly partitioned into training data and test data, respectively. The third tab contains a partially populated template for generating k-NN predictions. The remaining tabs contain pairwise distance calculations between each test observation, and each training data observation. For example, the tab named 29-167 contains the pairwise distance calculations between test observation 29-167, and every training data observation.
On the k-NN predictions tab, you will find that three sets of predictions have been pre-populated for you. These include i) a popularity-based predictor, i.e., the average rating that has been provided in the training data for the song ID in question (this is a common, intuitive approach, but it is also unsophisticated), ii) a continuous k-Nearest Neighbor prediction (i.e., kNN regression) and iii) a discrete k-Nearest Neighbor prediction (kNN classification). All three sets of predictions are provided for a set of 20 test observations that were randomly drawn from the available rating data. The kNN predictions are based on the k nearest-neighbors of each test observation, where k is the number of neighbors to consider, where near versus far is defined in terms of Euclidean distance. As you modify K, you will see the kNN predictions change for each test observation. You will also see that the popularity-based predictions remain fixed.
In addition to the predictions, placeholders have been provided for you to capture performance (error) metrics for all three approaches, including the continuous popularity and kNN based predictions (MAE, RMSE) and discrete (accuracy, error and a confusion matrix) prediction implementations.
As you adjust the value of K, you will see the predictions change, as well as the individual error values for each test observation.
Question 5
5
Points
Vary the value of k from 1 through 10. Based on the continuous (kNN regression) prediction error measures, what is the optimal number of nearest neighbors employ?
1
2
3
4
5
6
7
8
9
10
Question 6
5
Points
Based on the confusion matrix you observe when k =5, calculate the prediction accuracy of the kNN classifier in cell B30(Hint: overall accuracy is the proportion of the 20 predictions that were correct, i.e., on the diagonal of the confusion matrix).
65%
35%
40%
We cannot answer this question without more information.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data And Databases

Authors: Jeff Mapua

1st Edition

1978502257, 978-1978502253

More Books

Students also viewed these Databases questions

Question

What does Processing of an OLAP Cube accomplish?

Answered: 1 week ago