Question
Campaign organizers for both the Republican and Democratic parties are interested in identifying individual undecided voters who would consider voting for their party in an
Campaign organizers for both the Republican and Democratic parties are interested in identifying individual undecided voters who would consider voting for their party in an upcoming election. The file BlueOrRed contains data on a sample of voters with tracked variables, including whether or not they are undecided regarding their candidate preference, age, whether they own a home, gender, marital status, household size, income, years of education, and whether they attend church.
Create a standard partition of the data with all the tracked variables and 50% of observations in the training set, 30% in the validation set, and 20% in the test set. Classify the data using k-Nearest Neighbors with up to k = 20. Use Age, HomeOwner, Female, HouseholdSize, Income, Education, and Church as input variables and Undecided as the output variable. In Step 2 of XLMiner’s k-Nearest Neighbors Classification procedure, be sure to Normalize Input Data, Score on best k between 1 and specified value, and assign prior class probabilities According to relative occurrences in training data.
a. For k = 1, what is the overall error rate on the training set and the validation set, respectively? Explain the difference in magnitude of these two measures.
b. For the cutoff probability value of 0.5, what value of k minimizes the overall error rate on the validation data? Explain the difference in the overall error rate on the training, validation, and test set.
c. Examine the decile-wise lift chart for the test set. What is the first decile lift? Interpret this value.
d. In the effort to identify undecided voters, a campaign is willing to accept an increase in the misClassification of decided voters as undecided if it can correctly classify more undecided voters. For cutoff probability values of 0.5, 0.4, 0.3, and 0.2, what are the corresponding Class 1 error rates and Class 0 error rates on the validation data?
Step by Step Solution
3.52 Rating (159 Votes )
There are 3 Steps involved in it
Step: 1
Overall error rate 026 Class 1 error rate 017 Class ...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started