A credit score is a number, based on the analysis of a person's credit files, to represent

Question:

A credit score is a number, based on the analysis of a person's credit files, to represent the creditworthiness of the person.

A consumer services agency is interested in providing a service in which an individual can estimate their own credit score. The Excel file RawData.xlsx contains data on an individual's credit score and other variables. The description of these nine (9) variables can be found in the worksheet VariableDescription.

Make a Standard Partition of the data into Training, Validation, and Test sets. Select all the 9 variables to be in the partition, use 12345 as the seed in the randomized sampling, and specify 50% of observations in the training set, 35% in the validation set, and 15% in the test set.

Predict the individuals' credit scores using k-Nearest Neighbors with up to k = 15. Use CreditScore as the output variable and all the other variables as input variables. In Step 2 of XLMiner's k-Nearest Neighbors Prediction procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Select Summary Report for Score Training Data, and Score Validation Data. Select Detailed Report, Summary Report, and Lift Charts for Score Test Data.

Based on the results from XLMiner, answer the following questions.

a. What is the best k chosen? What does it mean?

b. Compare the RMSE on the validation set to the RMSE on the test set. Please comment.

c. What is the average error on the test set? What does it suggest (e.g., an underestimate or overestimate in the predication)?

d. Predict the CreditScore for two individuals (rounded to the nearest integers) with the following information, using the best k: