Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

A - 3 . [ 1 0 marks: 2 . 5 each ] : a . Split the dataset into training and testing sets using

A-3.[10 marks: 2.5 each]:
a. Split the dataset into training and testing sets using train_test_split function with 75% for
training and 25% for training using random state =10.
b. Build a decision tree classifier for predicting the class label. Fit the classifier using the
training dataset. Set random state to 100, criterion to entropy, and splitter to best.
c. Draw the decision tree using scikit-learn (sklearn)
d. Test the classifier on the testing data set, and print the confusion matrix and classification
metrics (Accuracy, sensitivity (Recall), Precision) of the decision tree classifier.
A-4.[10 marks: 2.5each : Using the same dataset split in A-3.a
Page 2 of 9
ISE-291: Homework 04
a. Build a Random forest classifier for predicting the class label with 4 trees. Fit the classifier
using the training set. Set criterion to entropy and random_state to 62.
b. Draw the trees using sci-kit learn (sklearn)
c. Test the classifier on the testing data set, and print the confusion matrix and classification
metrics (Accuracy, sensitivity (Recall), Precision) of the Random forest classifier.
d. Repeat A-4(a-c) using a Random forest with 8 trees instead of 4. A-5.[10 marks]: Calculate the Information Gain (IG) for the class variable "Drug" given the feature
selected "BP" as a root node.
A-6.[10 marks]: From the decision tree built in A-3, write three classification rules using the
normalized values first then return it to the original values.
A-7.[10 marks]: Write an association rule for " BP -> Cholestrol", which rule has the highest
accuracy? Write the corresponding support and accuracy.
A-8.[10 marks]: Repeat parts b, c, and d in A-3 using the Nave Bayes GaussianNB classifier.
A-9. Compare the performance of the Nave Bayes against the built decision tree and random forest
classifiers using confusion matrix. Based on the comparison, which one is the best to use with
the given datat set? Problem A [100 Marks]: Solve all the questions using Python. Use Pandas, Seaborn, Sklearn, etc.,
libraries for all the analysis. Consider the data given in Excel file HW4_DataA. Consider the following
data description:
Table 1. Data description
Do the following tasks (in exact sequence) using the "HW4_DataA" data:
A-1.[5 marks]: Read and display the data given in HW4_DataA. Describe both the numeric and
categorical attributes. Refer to Table 1 for the data description.
A-2.[10 marks: 2.5 each]: Do the necessary pre-processing. In specific do the following:
a. Normalize the numeric attributes using min-max normalization scheme.
b. Perform ordinal (label) encoding for ordinal attributes (BP, and Cholestrol). Use dictionary
for the ordinal encoding.
c. Perform one hot encoding for the categorical attribute (Sex)
d. Perorfm label encoding for the class (drug).
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro PowerShell For Database Developers

Authors: Bryan P Cafferky

1st Edition

1484205413, 9781484205419

More Books

Students also viewed these Databases questions

Question

Explain the global implications for recruitment.

Answered: 1 week ago

Question

Describe what competencies and competency modeling are.

Answered: 1 week ago

Question

Summarize job design concepts.

Answered: 1 week ago