Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 30, 2024

A - 3 . [ 1 0 marks: 2 . 5 each ] : a . Split the dataset into training and testing sets using

- 3 . [10

marks:

2.5

each

]

.

Split the dataset into training and testing sets using train

_

test

_

split function with

75 %

for

training and

25 %

for training using random state

= 10 .

.

Build a decision tree classifier for predicting the class label. Fit the classifier using the

training dataset. Set random state to

100,

criterion to entropy, and splitter to best.

.

Draw the decision tree using scikit

-

learn

(

sklearn

)

.

Test the classifier on the testing data set, and print the confusion matrix and classification

metrics

(

Accuracy

,

sensitivity

(

Recall

),

Precision

)

of the decision tree classifier.

- 4 . [10

marks:

2.5

each : Using the same dataset split in A

- 3 .

Page

2

9

ISE

- 291

: Homework

04

.

Build a Random forest classifier for predicting the class label with

4

trees. Fit the classifier

using the training set. Set criterion to entropy and random

_

state to

62 .

.

Draw the trees using sci

-

kit learn

(

sklearn

)

.

Test the classifier on the testing data set, and print the confusion matrix and classification

metrics

(

Accuracy

,

sensitivity

(

Recall

),

Precision

)

of the Random forest classifier.

.

Repeat A

- 4 (

-

)

using a Random forest with

8

trees instead of

4 .

- 5 . [10

marks

]

: Calculate the Information Gain

(

)

for the class variable "Drug" given the feature

selected

"

"

as a root node.

- 6 . [10

marks

]

: From the decision tree built in A

- 3,

write three classification rules using the

normalized values first then return it to the original values.

- 7 . [10

marks

]

: Write an association rule for

"

- >

Cholestrol", which rule has the highest

accuracy? Write the corresponding support and accuracy.

- 8 . [10

marks

]

: Repeat parts b

,

,

and d in A

- 3

using the Na

ve Bayes GaussianNB classifier.

- 9 .

Compare the performance of the Na

ve Bayes against the built decision tree and random forest

classifiers using confusion matrix. Based on the comparison, which one is the best to use with

the given datat set? Problem A

[100

Marks

]

: Solve all the questions using Python. Use Pandas, Seaborn, Sklearn, etc.,

libraries for all the analysis. Consider the data given in Excel file HW

4_

DataA. Consider the following

data description:

Table

1 .

Data description

Do the following tasks

(

in exact sequence

)

using the

"

4_

DataA" data:

- 1 . [5

marks

]

: Read and display the data given in HW

4_

DataA. Describe both the numeric and

categorical attributes. Refer to Table

1

for the data description.

- 2 . [10

marks:

2.5

each

]

: Do the necessary pre

-

processing. In specific do the following:

.

Normalize the numeric attributes using min

-

max normalization scheme.

.

Perform ordinal

(

label

)

encoding for ordinal attributes

(

,

and Cholestrol

) .

Use dictionary

for the ordinal encoding.

.

Perform one hot encoding for the categorical attribute

(

Sex

)

.

Perorfm label encoding for the class

(

drug

) .

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro PowerShell For Database Developers

Authors: Bryan P Cafferky

1st Edition

1484205413, 9781484205419

More Books

Students also viewed these Databases questions

Question

★★★★★

For the data in Exercise 5 in this section, construct a histogram, frequency polygon, and give using relative frequencies. What proportion of the railroad crossing accidents are less than 87? In...

Answered: 1 week ago

Question

★★★★★

A recent GSS asked, If the wife in a family wants children, but the husband decides that he does not want any children, is it all right for the husband to refuse to have children? Of 598 respondents,...

Answered: 1 week ago

Question

★★★★★

What evidence is provided of the quality of the data collection instruments in terms of the following? a. Relia bility or dependability b. Validity or credibility c. Objectivity or confirmability d....

Answered: 1 week ago

Question

★★★★★

Transaction data and journal entries For Crofoot Real Estate Agency are presented in E3-7 and E3-8. (a) Post the transactions to T-accounts. (b) Prepare a trial balance at October 31, 2014.

Answered: 1 week ago

Question

★★★★★

Be careful when signing a work-for-hire agreement if you are a songwriter. You may lose all rights to your song in the copyright law world. If your song sells thousands on ITunes, you do not get to...

Answered: 1 week ago

Question

★★★★★

Suppose the cutoff z score on the null distribution is Zcrit = +1.84 . You could reject the null hypothesis if the sample z score was

Answered: 1 week ago

Question

★★★★★

In the commercial section of the newspaper you come across an ad for a pizza delivery business for sale. upon inquiry, you discover that the owner , who wants to sell the business and then retire,...

Answered: 1 week ago

Question

★★★★★

An applied researcher is developing a new measure of organizational commitment for a study. In order to compare it to an existing measure of organizational commitment, the researcher would compute a...

Answered: 1 week ago

Question

★★★★★

7.5 A large lot of manufactured items contains 10% with exactly one defect, 5% with more than one defect, and the remainder with no defects. Ten items are randomly selected from this lot for sale. If...

Answered: 1 week ago

Question

★★★★★

A composite wall of a furnace has 3 layers of equal thickness having thermal conductivities in the ratio of 1:2:4. What will be the temperature drop ratio across the three respective layers? A. 12:4...

Answered: 1 week ago

Question

★★★★★

Assuming that the daily changes in a portfolio's value follow a normal distribution with a mean of zero and a standard deviation of R6 million. Calculate the following risk metrics: a) Determine the...

Answered: 1 week ago

Question

★★★★★

Explain the global implications for recruitment.

Answered: 1 week ago

Question

★★★★★

Describe what competencies and competency modeling are.

Answered: 1 week ago

Question

★★★★★

Summarize job design concepts.

Answered: 1 week ago

Previous Question Next Question