Question
Answer the following questions with R codes: Acceptance of Consumer Loan Universal Bank has begun a program to encourage its existing customers to borrow via
Answer the following questions with R codes:
Acceptance of Consumer Loan Universal Bank has begun a program to encourage its existing customers to borrow via a consumer loan program. The bank has promoted the loan to 5000 customers, of whom 480 accepted the o er. The data are available in le UniversalBank.csv. The bank now wants to develop a model to predict which customers have the greatest probability of accepting the loan, to reduce promotion costs and send the o er only to a subset of its customers.
We will develop several models, then combine them in an ensemble. The models we will use are (1) logistic regression, (2) k-nearest neighbors with k = 3, and (3) classi cation trees. Preprocess the data as follows:
Zip code can be ignored. Partition the data: 60% training, 40% validation.
a) Fit models to the data for (1) logistic regression, (2) k-nearest neighbors with k = 3, and (3) classi cation trees. Use Personal Loan as the outcome variable. Report the validation confusion matrix for each of the three models.
1. Run two libraries
a. library(class)
b. library(rpart)
2. Read bank.df from Universal Bank.CSV
3. Drop ID and ZipCode (1 and 5)
4. Set seed to create validation and training datasets
5. Now, run the following functions:
a. Logistic Regression using glm
b. K-Nearest using class::knn
c. Decision Tree using rpart
b) Create a data frame with the actual outcome, predicted outcome, and each of the three models. Report the rst 10 rows of this data frame.
To create data frame, use the following code, spend time understanding the code and its meaning:
res<- data.frame(valid.df$Personal.Loan,
LogisticProb = predict(reg, valid.df, type = "response"),
LogisticPred = ifelse(predict(reg, valid.df, type = "response")>0.5, 1, 0),
KNNProb = 1-attr(kn, "prob"),
KNNPred = kn,
TREEProb = predict(tr, valid.df)[,2],
TREEPred = ifelse(predict(tr, valid.df)[,2]>0.5, 1, 0))
To view the data frame, use the following: head(res, 10) # This will allow you to display the first 10 rows.
c) Add two columns to this data frame for (1) a majority vote of predicted outcomes, and (2) the average of the predicted probabilities. Using the classi cations generated by these two methods derive a confusion matrix for each method and report the overall accuracy.
Plaese use the following for part c:
res$majority <-
rowMeans(data.frame(res$LogisticPred, as.numeric(res$KNNPred), res$TREEPred))>0.5
res$avg <- rowMeans(data.frame(res$LogisticProb, res$KNNProb, res$TREEProb))
d) Compare the error rates for the three individual methods and the two ensemble methods.
Use this to answer part d:
confusionMatrix(res$majority * 1, valid.df[,8])
confusionMatrix((res$avg > 0.5)* 1, valid.df[,8])
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started