Question
Sandhills Bank would like to increase the number of customers who use payroll direct deposit as part of the rollout of its new e-banking platform.
Sandhills Bank would like to increase the number of customers who use payroll direct deposit as part of the rollout of its new e-banking platform. Management has proposed offering an increased interest rate on a savings account if customers sign up for direct deposit into a checking account. To determine whether this proposal is a good idea, management would like to estimate how many of the 200 current customers who do not use direct deposit would accept the offer. The IT company that handles Sandhills Banks e-banking has provided anonymized data for 1,000 customers from one of its other client banks that made a similar promotion to increase direct deposit participation. For these 1,000 customers, each observation consists of the average monthly checking account balance and whether the customer signed up for direct deposit. In the file Sandhills, these data are split so that 600 observations are in the training set and 400 observations are in the validation set. As Sandhills has not yet launched its promotion to any of these 200 customers, it has entered an artificial value of zero (i.e., "No") for whether they have signed up for direct deposit. As some of these 200 customers will be the target of the direct-deposit promotion, Sandhills would like to estimate the likelihood of these customers signing up for direct deposit based on their average monthly balance Classify the data using k-nearest neighbors for values of k = 1, ..., 10. Use Balance as the input variable and Direct as the output variable. Refer to the Appendix for instructions on how to perform k-nearest neighbors using the Analytic Solver Platform. In the Parameters tab, Partition Data and Use partition variable named Partition, enter 10 for the # Neighbors (K), and select Search 1..K for value of k that achieves the minimum classification error. In the Scoring tab, generate a Detailed Report on the test data. Click on the datafile logo to reference the data.
(a) For the cutoff probability value 0.5, what value of k minimizes the overall error rate on the validation data?
(b) What is the area under the ROC curve on the validation set? If needed, round your answer to three decimal digits.
To achieve a sensitivity of 0.80, how much Class 0 error rate must be tolerated? If needed, round your answer to a whole percentage.
(c) Using the default cutoff value of 0.5, how many of Sandhills Banks 200 customers does k-Nearest Neighbors classify as enrolling in direct deposit?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started