Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 08, 2024

1.Consider the Default dataset available in ISLR library. The Default dataset provides information on 10,000 customers. Furthermore, the dataset contains four variables as follows: Table

1.Consider the "Default" dataset available in "ISLR" library. The "Default" dataset provides information on 10,000 customers. Furthermore, the dataset contains four variables as follows:

Table 1: Variable Description

Variables

Description

Default

A factor with levels "No" and "Yes" indicating whether the customer defaulted on their debt.

Student

A factor with levels "No"and "Yes" indicating whether the customer is a

student.

Balance

The average balance that the customer has remaining on their credit card after making their monthly payment.

Income

Income of customer.

The objective is to predict whether an individual will default on his/her credit card payment. The data set is divided in two partsa training set consisting of 5000 observations and a test set consisting of the remaining 5000 observations. Following results are available:

Table 2: Logistic Regression Model (Model 1) for predicting "default" using "student"

based on the Training Data Set

Estimate

Standard

Error

value

-value

Intercept

3.482

0.099

35.312

0.000

studentYes

0.307

0.166

1.845

0.065

Note that "studentYes" is a dummy variable which takes the value 1 if the individual is a "student" and 0 otherwise. Furthermore, another logistic regression model using the variables "student" and "balance" is fitted based on the training data set. The details are provided below:

Table 3: Logistic Regression Model (Model 2) for predicting "default" using "balance" and "student" based on the Training Data Set

Estimate

Standard

Error

value

-value

Intercept

11.020

0.7130

15.446

0.000

balance

0.006

0.0003

17.383

0.000

studentYes

0.822

0.3401

2.416

0.016

Based on the fitted logistic regression model (Model 2), a confusion matrix is obtained for the test dataset. The confusion matrix is shown below.

Table 4: Confusion Matrix Based on Test Data Set for Logistic Regression

True "Default" Status

Predicted "Default" Status

Yes

4808

120

Yes

Based on the above output, answer the following questions (no need to fit any of the models):

a.Write the equation of the fitted Model 1 (summary provided in Table 2). Calculate the predicted default probabilities for an individual who is a student and for an individual who is not a student. Who is riskier for the credit card company?[3]

b.Consider the fitted Model 2 (summary provided in Table 3). Interpret the coefficients. Discuss the difference between Models 1 and 2.[4]

c.Suppose the credit card company wants to provide a credit card only to those customers who have the predicted default probability below 0.10. Recently, a student has approached the credit card company. Based on the fitted Model 2 (summary provided in Table 3), calculate the maximum allowed "balance" for such an individual.[3]

d.Based on the confusion matrix shown above (in Table 4), compute the sensitivity, specificity and total error rate for the logistic regression model.[3]

e.Discuss how the performance of the logistic regression model can be improved.

[3]

f.The logistic regression model uses here only a limited number of predictors. Identify at least 5 more predictors that can be useful in this context.[3]

g.How will you use such a logistic regression model for decision-making in this context?

[3]

h.What are the possible downsides of such a model? Discuss in detail.[3]