Campaign organizers for both the Republican and Democrat parties are interested in identifying individual undecided voters who
Question:
Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Classify the data using k-nearest neighbors with up to k = 20. Use Age, Home Owner, Female, Married, HouseholdSize, Income, and Education as input variables and Undecided as the output variable. In Step 2 of XLMiner's k-nearest neighbors Classification procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data.
a. For k = 1, why is the overall rate equal to 0 percent on the training set? Why isn't the overall rate equal to 0 percent on the validation set?
b. For the cutoff probability value 0.5, what value of k minimizes the overall error rate on the validation data? Explain the difference in the overall error rate on the training, validation, and test data.
c. Examine the decile-wise lift chart. What is the first decile lift on the test data? Interpret this value.
d. In the effort to identify undecided voters, a campaign is willing to accept an increase in the misclassification of decided voters as undecided if it can correctly classify more undecided voters. For cutoff probability values of 0.5, 0.4, 0.3, and 0.2, what are the corresponding Class 1 error rates and Class 0 error rates on the validation data?
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Essentials Of Business Analytics
ISBN: 611
1st Edition
Authors: Jeffrey Camm, James Cochran, Michael Fry, Jeffrey Ohlmann, David Anderson, Dennis Sweeney, Thomas Williams
Question Posted: