Question

1 Approved Answer

Posted on Jul 05, 2024

The supplied Excel workbook implements kNN classifications and numeric predictions for a set of 20 test observations based on a large sample of test observations.

The supplied Excel workbook implements kNN classifications and numeric predictions for a set of 20 test observations based on a large sample of test observations. Your task in this assignment is to: 1. Tune the model to identify an optimal value of k for kNN regression (continuous prediction); 2. Tune the model to identify an optimal value of k for kNN classification (discrete prediction); 3. Calculate accuracy measures associated with a kNN classification (discrete prediction); 4. Calculate a measure of the informational value of the kNN approach over a naive, popularity- based prediction; 5. Finally, answer conceptual questions about how the resulting models might be used in practice. Exercise | Ratings Exercise Assignment Excel Workbook Instructions 1. Open the Excel workbook. 2. Review the elements of this workbook before you begin. a. There are several tabs in this workbook. i. The first two tabs contain historical user-song_ ratings, randomly partitioned g data and test data, respectively. i. The third tab contains a partially populated template for generating kNN predictions. . The remaining tabs contain pairwise distance calculations between each test observation and each training data observation. For example, the tab named \"29-167" contains the pairwise distance calculations between test observation 29-167 and every training data observation. b. On the kNN predictions tab, you will find that three sets of predictions have been pre-populated for you. These include: i. A popularity-based predictor, i.e., the average rating that has been provided in the training data for the song ID in question (this is a common, intuitive approach, but it is also unsophisticated), . A continuous k-nearest neighbors prediction (i.e., kNN regression) and . A discrete k-nearest neighbors prediction (kNN classification). c. All three sets of predictions are provided for a set of 20 test observations that were randomly drawn from the available rating data. i. The kNN predictions are based on the k-nearest neighbors of each test observation, where k is the number of neighbors to consider and \"near\" versus \"far\" is defined in terms of Euclidean distance. d. As you modify k, you will see the kNN predictions change for each test observation. You will also see that the popularity-based predictions remain fixed. e. In addition to the predictions, placeholders have been provided for you to capture performance (error) metrics for all three approaches, including the continuous popularity and kNN-based predictions (MAE, RMSE) and discrete (accuracy, error, and a confusion matrix) prediction implementations. 3. Experiment by adjusting the value of k. You will see the predictions change as well as the individual error values for each test observation. 4. Use this workbook to answer the nine questions in the kNN classification assignment that follows