Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Fit a logistic regression model on the Caravan data set from the R package ISLR. This data set, also analyzed in Sec 4.6.6 of ISLR,

Fit a logistic regression model on the "Caravan" data set from the R package "ISLR". This data set, also analyzed in Sec 4.6.6 of ISLR, has 85 predictors and the response variable is "Purchase" that is equal to "Yes" or "No".

We use the first 1000 obs as the test data and the remaining as the training data. In the test data, there are 941 "No" and 59 "Yes". For each of the approaches below, report the number of mis-classified samples among the 941 "No" and the number of mis-classified samples among 59 "Yes", if we use 0.25 as the predicted probability cut-off. Also use the R package "pROC" to report the corresponding AUC. For the definition of AUC and ROC, read pp146-149 of ISLR.

Fit a logistic regression model using all 85 predictors, and obtain the predicted probabilities on the test data.

  • If we use 0.25 as the probability cut-off, we misclassify ________[a1] (an integer) samples among 941 "No" and misclassifty ________[b1] (an integer) samples among 59 "Yes".
  • The AUC for this classifier is ______[c1] (round to 3 digits after the decimal point).

Apply forward variable selection using AIC. Use the selected model to obtain the predicted probabilities on the test data.

  • We use a model with ______[d2] (a non-negative integer) non-intercept predictors.
  • If we use 0.25 as the probability cut-off, we misclassify ______ [a2] (an integer) samples among 941 "No" and misclassifty ______ [b2] (an integer) samples among 59 "Yes".
  • The AUC for this classifier is ______ [c2] (round to 3 digits after the decimal point).

Apply forward variable selection using BIC. Use the selected model to obtain the predicted probabilities on the test data.

  • We use a model with ______ [d3] (a non-negative integer) non-intercept predictors.
  • If we use 0.25 as the probability cut-off, we misclassify ______ [a3] (an integer) samples among 941 "No" and misclassifty ______ [b3] (an integer) samples among 59 "Yes".
  • The AUC for this classifier is ______ [c3] (round to 3 digits after the decimal point).

Use L1 penalty to select a subset of the predictors. Use the glmnet package and set lambda = 0.004, and use the default options such as standardize = TRUE, intercept=TRUE. Use the selected model to obtain the predicted probabilities on the test data.

  • We use a model with ______ [d4] (a non-negative integer) non-intercept predictors.
  • If we use 0.25 as the probability cut-off, we misclassify ______ [a4] (an integer) samples among 941 "No" and misclassifty ______ [b4] (an integer) samples among 59 "Yes".
  • The AUC for this classifier is ______ [c4] (round to 3 digits after the decimal point).

Result for:

a1:

b1:

c1

d2

a2

b2

c2

d3

a3

b3

c3

d4

a4

ba

ca

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Partial Differential Equations For Scientists And Engineers

Authors: Stanley J Farlow

1st Edition

0486134733, 9780486134734

More Books

Students also viewed these Mathematics questions