Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

File = Bankdata.csv URL : https://docs.google.com/spreadsheets/d/1Qn4pdGrhzLjoXy1F871BalEz57rqdsB8OvLd4AbhCbA/edit#gid=1783727779 Data Preprocessing Q. Explore the dataset to identify the features and the class attribute. In general, scikit-learn doesn't deal

File = Bankdata.csv

URL : https://docs.google.com/spreadsheets/d/1Qn4pdGrhzLjoXy1F871BalEz57rqdsB8OvLd4AbhCbA/edit#gid=1783727779

Data Preprocessing

Q. Explore the dataset to identify the features and the class attribute. In general, scikit-learn doesn't deal with categorical data well. Some classifiers need normalized data. Consider if there are any missing values,outliers, and attributes that have no predict power.

Q. Convert pandas DataFrames into numpy arrays that can be used by scikit-learn. Show your data after being preprocessed. If none of the techniques described below is able to achieve close to 90% accuracy, examine your data again to see if you can preprocess the data in a different way.

Apply the following techniques to your preprocessed data set, and see which one yields the highest accuracy as measured with 10-fold cross validation.

Decision tree

Q. Create a single train/test split of your data. Set aside 75% for training, and 25% for testing. Use

tree.DecisionTreeClassifier to create a model and fit it to your training data. Measure the

accuracy of the resulting decision tree model using your test data. (Hint: you don't have to

visualize the tree and you can use score method to get the accuracy.)

Q. Instead of a single train/test split, use 10-fold cross validation to get a measure of your model's

accuracy. (Hint: use model_selection.cross_val_score and use mean method to find the average)

Random forest

Q. Use ensemble.RandomForestClassifier with n_estimators=10 and use 10-fold cross validation

to get a measure of the accuracy. Does it perform better than decision tree?

KNN

Q. Use neighbors.KNeighborsClassifier with n_neighbors=10 and use 10-fold cross validation to

get a measure of the accuracy.

Q. Try different values of K. Write a for loop to run KNN with K values ranging from 1 to 50 and

see if the value of K makes a substantial difference. Make a note of the best performance you

could get out of KNN.

Naive Bayes

Q. Use naive_bayes.GaussianNB and use 10-fold cross validation to get a measure of the accuracy.

Q. Use nave_bayes.MultinomailNB and use 10-fold cross validation to get a measure of the

accuracy. Does it perform better than GaussianNB?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Programming questions