Question
1.Consider the Default dataset available in ISLR library. The Default dataset provides information on 10,000 customers. Furthermore, the dataset contains four variables as follows: Table
1.Consider the "Default" dataset available in "ISLR" library. The "Default" dataset provides information on 10,000 customers. Furthermore, the dataset contains four variables as follows:
Table 1: Variable Description
Variables
Description
Default
A factor with levels "No" and "Yes" indicating whether the customer defaulted on their debt.
Student
A factor with levels "No"and "Yes" indicating whether the customer is a
student.
Balance
The average balance that the customer has remaining on their credit card after making their monthly payment.
Income
Income of customer.
The objective is to predict whether an individual will default on his/her credit card payment. The data set is divided in two partsa training set consisting of 5000 observations and a test set consisting of the remaining 5000 observations. Following results are available:
Table 2: Logistic Regression Model (Model 1) for predicting "default" using "student"
based on the Training Data Set
Estimate
Standard
Error
value
-value
Intercept
3.482
0.099
35.312
0.000
studentYes
0.307
0.166
1.845
0.065
Note that "studentYes" is a dummy variable which takes the value 1 if the individual is a "student" and 0 otherwise. Furthermore, another logistic regression model using the variables "student" and "balance" is fitted based on the training data set. The details are provided below:
Table 3: Logistic Regression Model (Model 2) for predicting "default" using "balance" and "student" based on the Training Data Set
Estimate
Standard
Error
value
-value
Intercept
11.020
0.7130
15.446
0.000
balance
0.006
0.0003
17.383
0.000
studentYes
0.822
0.3401
2.416
0.016
Based on the fitted logistic regression model (Model 2), a confusion matrix is obtained for the test dataset. The confusion matrix is shown below.
Table 4: Confusion Matrix Based on Test Data Set for Logistic Regression
True "Default" Status
Predicted "Default" Status
No
Yes
No
4808
120
Yes
23
49
Based on the above output, answer the following questions (no need to fit any of the models):
a.Write the equation of the fitted Model 1 (summary provided in Table 2). Calculate the predicted default probabilities for an individual who is a student and for an individual who is not a student. Who is riskier for the credit card company?[3]
b.Consider the fitted Model 2 (summary provided in Table 3). Interpret the coefficients. Discuss the difference between Models 1 and 2.[4]
c.Suppose the credit card company wants to provide a credit card only to those customers who have the predicted default probability below 0.10. Recently, a student has approached the credit card company. Based on the fitted Model 2 (summary provided in Table 3), calculate the maximum allowed "balance" for such an individual.[3]
d.Based on the confusion matrix shown above (in Table 4), compute the sensitivity, specificity and total error rate for the logistic regression model.[3]
e.Discuss how the performance of the logistic regression model can be improved.
[3]
f.The logistic regression model uses here only a limited number of predictors. Identify at least 5 more predictors that can be useful in this context.[3]
g.How will you use such a logistic regression model for decision-making in this context?
[3]
h.What are the possible downsides of such a model? Discuss in detail.[3]
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started