Question
The dataset UniversalBank.csv below contains data on 5000 customers. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage,
The dataset UniversalBank.csv below contains data on 5000 customers. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (PersonalLoan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign.
Partition the dataset into 60% training and 40% validation sets considering the information on the following customer:
Age = 40, Experience = 10, Income = 84, Family = 2, CCAvg = 2, Education_1 = 0, Education_2 = 1,Education_3 = 0, Mortgage = 0, Securities Account = 0, CD Account = 0, Online = 1, and Credit Card=1
Second part of the problem
Consider the following customer:
Age = 40, Experience = 10, Income = 84, Family = 2, CCAvg = 2, Education_1 = 0, Education_2 = 1,Education_3 = 0, Mortgage = 0, Securities Account = 0, CD Account = 0, Online = 1 and Credit Card= 1.
Classify the above customer using the best k.
Repartition the data, this time into training, validation, and test sets (50% : 30% : 20%).
Apply the k-NN method with the k chosen above.
Compare the confusion matrix of the test set with that of the training and validation sets.
Comment on the differences and their reason
dataset and my current codes (some are not working)
dataset- https://github.com/MyGitHub2120/Personal-Loan-Acceptance
Here are my codes
library("dplyr")
library("tidyr")
library("ggplot2")
library("rpart")
library("rpart.plot")
library("caret")
library("randomForest")
library("tidyverse")
library("glmnet")
library("Hmisc")
library("dummies")
library('tinytex')
library('GGally')
library('gplots')
library("dplyr")
library("tidyr")
library("caTools")
library("reshape")
df<-read_csv("C:/Users/andyt/OneDrive/Desktop/UniversalBank.csv")
View(UniversalBank)
bank<-df
names(bank)
bank$Education <- as.factor(bank$Education)
bank_dummy<-dummy.data.frame(select(bank,-c(Zip.Code,ID))) Could not categorize the variable 'Zip Code' Need to resolve this issue for the next code
bank_dummy$Personal.Loan = as.factor(bank_dummy$Personal.Loan)
bank_dummy$CCAvg = as.integer(bank_dummy$CCAvg)
set.seed(1)
train.index <- sample(row.names(bank_dummy), 0.6*dim(bank_dummy)[1])## need to look at hints
test.index <- setdiff(row.names(bank_dummy), train.index)
train.df <- bank_dummy[train.index, ]
valid.df <- bank_dummy[test.index, ]
new.df = data.frame(Age = as.integer(40), Experience = as.integer(10), Income = as.integer(84), Family = as.integer(2), CCAvg = as.integer(2), Education1 = as.integer(0), Education2 = as.integer(1), Education3 = as.integer(0), Mortgage = as.integer(0), Securities.Account = as.integer(0), CD.Account = as.integer(0), Online = as.integer(1), CreditCard = as.integer(1))
norm.values <- preProcess(train.df[, -c(10)], method=c("center", "scale"))
train.df[, -c(10)] <- predict(norm.values, train.df[, -c(10)])
valid.df[, -c(10)] <- predict(norm.values, valid.df[, -c(10)])
new.df <- predict(norm.values, new.df)
knn.1 <- knn(train = train.df[,-c(10)],test = new.df, cl = train.df[,10], k=5, prob=TRUE)
knn.attributes <- attributes(knn.1)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started