Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Answer the following questions with R codes: Acceptance of Consumer Loan Universal Bank has begun a program to encourage its existing customers to borrow via

Answer the following questions with R codes:

Acceptance of Consumer Loan Universal Bank has begun a program to encourage its existing customers to borrow via a consumer loan program. The bank has promoted the loan to 5000 customers, of whom 480 accepted the o er. The data are available in le UniversalBank.csv. The bank now wants to develop a model to predict which customers have the greatest probability of accepting the loan, to reduce promotion costs and send the o er only to a subset of its customers.

We will develop several models, then combine them in an ensemble. The models we will use are (1) logistic regression, (2) k-nearest neighbors with k = 3, and (3) classi cation trees. Preprocess the data as follows:

Zip code can be ignored. Partition the data: 60% training, 40% validation.

a) Fit models to the data for (1) logistic regression, (2) k-nearest neighbors with k = 3, and (3) classi cation trees. Use Personal Loan as the outcome variable. Report the validation confusion matrix for each of the three models.

1. Run two libraries

a. library(class)

b. library(rpart)

2. Read bank.df from Universal Bank.CSV

3. Drop ID and ZipCode (1 and 5)

4. Set seed to create validation and training datasets

5. Now, run the following functions:

a. Logistic Regression using glm

b. K-Nearest using class::knn

c. Decision Tree using rpart

b) Create a data frame with the actual outcome, predicted outcome, and each of the three models. Report the rst 10 rows of this data frame.

To create data frame, use the following code, spend time understanding the code and its meaning:

res<- data.frame(valid.df$Personal.Loan,

LogisticProb = predict(reg, valid.df, type = "response"),

LogisticPred = ifelse(predict(reg, valid.df, type = "response")>0.5, 1, 0),

KNNProb = 1-attr(kn, "prob"),

KNNPred = kn,

TREEProb = predict(tr, valid.df)[,2],

TREEPred = ifelse(predict(tr, valid.df)[,2]>0.5, 1, 0))

To view the data frame, use the following: head(res, 10) # This will allow you to display the first 10 rows.

c) Add two columns to this data frame for (1) a majority vote of predicted outcomes, and (2) the average of the predicted probabilities. Using the classi cations generated by these two methods derive a confusion matrix for each method and report the overall accuracy.

Plaese use the following for part c:

res$majority <-

rowMeans(data.frame(res$LogisticPred, as.numeric(res$KNNPred), res$TREEPred))>0.5

res$avg <- rowMeans(data.frame(res$LogisticProb, res$KNNProb, res$TREEProb))

d) Compare the error rates for the three individual methods and the two ensemble methods.

Use this to answer part d:

confusionMatrix(res$majority * 1, valid.df[,8])

confusionMatrix((res$avg > 0.5)* 1, valid.df[,8])

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data Management Databases And Organizations

Authors: Richard T. Watson

3rd Edition

0471418455, 978-0471418450

More Books

Students also viewed these Databases questions

Question

How do Excel Pivot Tables handle data from non OLAP databases?

Answered: 1 week ago