Question
Part 1 For this assignment, you will use the Default dataset located in R's ISLR package. You are the analyst in the credit department at
Part 1
For this assignment, you will use the "Default" dataset located in R's ISLR package.
You are the analyst in the credit department at a large bank who has been tasked with building a model to predict whether a cardholder will default on their credit card. To do so, you have some basic information about cardholders: whether or not they are a student, their credit card balance, and their income.
Use kNN to determine how effective these variables are in predicting credit card default by completing the following steps.
Question 1: What are the model assumptions for the k-Nearest Neighbor model? What are the limitations of the kNN model? For what types of business problems would kNN be an appropriate model to use? Use a specific example to support your rationale.
Question 2: Load the "ISLR" and "class" libraries into your R environment. Load the "Default" data into a data frame object called "Default." Check the dimensions of the data set to ensure it is loaded correctly. (You should get a data set with 10,000 observations and 4 variables.)
Question 3: Change the variable "student" into a numeric variable for use in the kNN model. Check to see that the transformation worked as expected by using the 'table' function to show counts of students/nonstudents. How many in the data set are students?
Question 4: The knn()function requires 4 arguments: 1) train, or the predictors/features for the training set; 2) test, or the predictors/features for the testing set; 3) cl, or the true class labels for the training set (so it can "learn" how to associate the features with the classes); and k, or the number of neighbors to consider in making a classification. Therefore, you need to partition your data into training and testing sets and extract the variable "Default" into its own vector for use in the model. Run the following lines of code to complete this step. Include the code as part of your answer and be sure to comment on each line using ## to explain what the code is doing.
set.seed(42)default_idx <- sample(nrow(Default), .7*(nrow(Default)))
default_idxdefault_trn <- Default[default_idx, ]default_tst <- Default[-default_idx, ]x_default_trn <- default_trn[, -1]y_default_trn <- default_trn$defaultView(x_default_trn)View(y_default_trn)x_default_tst <- default_tst[, -1]y_default_tst <- default_tst$default
Question 5: Create a new object called 'kmod1' that stores the results of a kNN model with a k of 3. How many defaults does the model predict in the testing set? What is the overall predictive accuracy of the model? Do you consider the model to be accurate for predicting credit card default? Explain your answer using the model results.
Question 6: Following the code provided in the book (see table 7.3), write the code to find the predictive accuracy of each kNN model with k values from 1 to 14. What is the ideal k value based on predictive accuracy? What is the accuracy rate at the ideal value of k? Be sure to include the R console output as part of your submission.
Question 7: Repeat the analysis from question 6, but this time standardize the inputs. What does it mean to "standardize" the variables? How might the results of a kNN model be affected when the inputs are not standardized, and how does standardization avoid this issue? Does the ideal value of k change when the inputs are standardized? Does the predictive accuracy of the model change? If so, how?
Question 8: Run the final kNN model that maximizes predictive accuracy based on your results from questions 6 and 7. Produce the confusion matrix for this model. Does your model significantly improve predictive accuracy over a nave model that assumes nobody will default? How can you tell?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started