Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Oct 16, 2024

File = Bankdata.csv URL : https://docs.google.com/spreadsheets/d/1Qn4pdGrhzLjoXy1F871BalEz57rqdsB8OvLd4AbhCbA/edit#gid=1783727779 Data Preprocessing Q. Explore the dataset to identify the features and the class attribute. In general, scikit-learn doesn't deal

File = Bankdata.csv

URL : https://docs.google.com/spreadsheets/d/1Qn4pdGrhzLjoXy1F871BalEz57rqdsB8OvLd4AbhCbA/edit#gid=1783727779

Data Preprocessing

Q. Explore the dataset to identify the features and the class attribute. In general, scikit-learn doesn't deal with categorical data well. Some classifiers need normalized data. Consider if there are any missing values,outliers, and attributes that have no predict power.

Q. Convert pandas DataFrames into numpy arrays that can be used by scikit-learn. Show your data after being preprocessed. If none of the techniques described below is able to achieve close to 90% accuracy, examine your data again to see if you can preprocess the data in a different way.

Apply the following techniques to your preprocessed data set, and see which one yields the highest accuracy as measured with 10-fold cross validation.

Decision tree

Q. Create a single train/test split of your data. Set aside 75% for training, and 25% for testing. Use

tree.DecisionTreeClassifier to create a model and fit it to your training data. Measure the

accuracy of the resulting decision tree model using your test data. (Hint: you don't have to

visualize the tree and you can use score method to get the accuracy.)

Q. Instead of a single train/test split, use 10-fold cross validation to get a measure of your model's

accuracy. (Hint: use model_selection.cross_val_score and use mean method to find the average)

Random forest

Q. Use ensemble.RandomForestClassifier with n_estimators=10 and use 10-fold cross validation

to get a measure of the accuracy. Does it perform better than decision tree?

KNN

Q. Use neighbors.KNeighborsClassifier with n_neighbors=10 and use 10-fold cross validation to

get a measure of the accuracy.

Q. Try different values of K. Write a for loop to run KNN with K values ranging from 1 to 50 and

see if the value of K makes a substantial difference. Make a note of the best performance you

could get out of KNN.

Naive Bayes

Q. Use naive_bayes.GaussianNB and use 10-fold cross validation to get a measure of the accuracy.

Q. Use nave_bayes.MultinomailNB and use 10-fold cross validation to get a measure of the

accuracy. Does it perform better than GaussianNB?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Microsoft Dynamics 365 Core Finance And Operations Exams And Practice Tests Exam Study Guide For Microsoft Mb 300

Authors: Exam Library

★★★★★

Abotte Products produces three products, A, B, and C. The company can sell up to 500 pounds of each product at the following prices (per pound): product A, $10; product B, $15; and product C, $25....

Answered: 1 week ago

Previous Question Next Question