Question
A bank wants to understand the details of customers who are likely to default on their loans. In order to analyze this, the data from
A bank wants to understand the details of customers who are likely to default on their loans. In order to analyze this, the data from a random sample of 200 customers are given in the file attached. The column variables are: A - average daily balance; B - Age of customer; C - 0=male, 1=female; D - 0=not married, 1 = married; E - 0=not divorced, 1=divorced; F - size of household; G - 0=no default, 1=in default. Compute a logistic regression, using Real Statistics and answer the following questions:
Note that a - f deal with the model with all of the variables.
a. what is the overall accuracy rate of the model with all variables?
b. what is the AUC of the model with all variables?
c. Would it be worse to have a false positive or a false negative in this model (from the perspective of the bank)? Why? Explain in a sentence or two.
d. Given your answer to part c, what error metric should the bank watch? What is the value of this error rate for the model with all the variables?
e. Given a divorced male, 52 years old, currently single, living alone, with an average daily balance of $2350, should you give him a loan? Justify your answer using your model.
f. Can you suggest a better cut-off percentage than 50%? If so, what is it and why is it better? If not, why is a 50/50 split okay moving forward?
g. Can you find a better model with fewer variables? If so, what is it, and what measurement are you using to illustrate its superiority? If not, why do you think that this is a fine model to use moving forward?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started