Question
File = Bankdata.csv URL : https://docs.google.com/spreadsheets/d/1Qn4pdGrhzLjoXy1F871BalEz57rqdsB8OvLd4AbhCbA/edit#gid=1783727779 Data Preprocessing Q. Explore the dataset to identify the features and the class attribute. In general, scikit-learn doesn't deal
File = Bankdata.csv
URL : https://docs.google.com/spreadsheets/d/1Qn4pdGrhzLjoXy1F871BalEz57rqdsB8OvLd4AbhCbA/edit#gid=1783727779
Data Preprocessing
Q. Explore the dataset to identify the features and the class attribute. In general, scikit-learn doesn't deal with categorical data well. Some classifiers need normalized data. Consider if there are any missing values,outliers, and attributes that have no predict power.
Q. Convert pandas DataFrames into numpy arrays that can be used by scikit-learn. Show your data after being preprocessed. If none of the techniques described below is able to achieve close to 90% accuracy, examine your data again to see if you can preprocess the data in a different way.
Apply the following techniques to your preprocessed data set, and see which one yields the highest accuracy as measured with 10-fold cross validation.
Decision tree
Q. Create a single train/test split of your data. Set aside 75% for training, and 25% for testing. Use
tree.DecisionTreeClassifier to create a model and fit it to your training data. Measure the
accuracy of the resulting decision tree model using your test data. (Hint: you don't have to
visualize the tree and you can use score method to get the accuracy.)
Q. Instead of a single train/test split, use 10-fold cross validation to get a measure of your model's
accuracy. (Hint: use model_selection.cross_val_score and use mean method to find the average)
Random forest
Q. Use ensemble.RandomForestClassifier with n_estimators=10 and use 10-fold cross validation
to get a measure of the accuracy. Does it perform better than decision tree?
KNN
Q. Use neighbors.KNeighborsClassifier with n_neighbors=10 and use 10-fold cross validation to
get a measure of the accuracy.
Q. Try different values of K. Write a for loop to run KNN with K values ranging from 1 to 50 and
see if the value of K makes a substantial difference. Make a note of the best performance you
could get out of KNN.
Naive Bayes
Q. Use naive_bayes.GaussianNB and use 10-fold cross validation to get a measure of the accuracy.
Q. Use nave_bayes.MultinomailNB and use 10-fold cross validation to get a measure of the
accuracy. Does it perform better than GaussianNB?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started