Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 29, 2024

Problem 2 : Decision Tree, post - pruning and cost complexity parameter using sklearn 0 . 2 2 We will use a pre - processed

Problem

2

: Decision Tree, post

-

pruning and cost complexity parameter using sklearn

0.22

We will use a pre

-

processed natural language dataset in the CSV file "spamdata.csv

"

to classify emails as spam or not. Each row contains the word frequency for

54

words plus statistics on the longest "run" of captial letters. Word frequency is given by: fi

=

/

N Where fi is the frequency for word i

,

mi is the number of times word i appears in the email, and N is the total number of words in the email. We will use decision trees to classify the emails. TO DO

1

: Complete the function get

_

spam

_

dataset to read in values from the dataset and split the data into train and test sets. def get

_

spam

_

dataset

(

filepath

=

"data

/

spamdata

.

csv

",

test

_

split

= 0.1)

'''

get

_

spam

_

dataset Loads csv file located at "filepath". Shuffles the data and splits it so that the you have

(1 -

test

_

split

) * 100 %

training examples and

(

test

_

split

) * 100 %

testing examples. Args: filepath: location of the csv file test

_

split: percentage

/ 100

of the data should be the testing split Returns: X

_

train, X

_

test, y

_

train, y

_

test, feature

_

names

(

in that order

)

first four are np

.

ndarray

'''

# complete your code here return

0

TO DO

2

: Import the data set into five variables: X

_

train, X

_

test, y

_

train, y

_

test, label

_

names # Uncomment and edit the line below to complete this task. test

_

split

= 0.1

# default test

_

split; change it if you'd like; ensure that this variable is used as an argument to your function # your code here # X

_

train, X

_

test, y

_

train, y

_

test, label

_

names

=

.

arange

(5)

TO DO

3

: Build a decision tree classifier using the sklearn toolbox. Then compute metrics for performance like precision and recall. This is a binary classification problem, therefore we can label all points as either positive

(

SPAM

)

or negative

(

NOT SPAM

) .

def build

_

(

data

_

,

data

_

,

max

_

depth

=

None, max

_

leaf

_

nodes

=

None

)

'''

This function builds the decision tree classifier and fits it to the provided data. Arguments data

_

-

a np

.

ndarray data

_

-

.

ndarray max

_

depth

-

None if unrestricted, otherwise an integer for the maximum depth the tree can reach. Returns: A trained DecisionTreeClassifier

'''

# complete your code here

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Design Using Entity Relationship Diagrams

Authors: Sikha Saha Bagui, Richard Walsh Earp

3rd Edition

103201718X, 978-1032017181

More Books

Students also viewed these Databases questions

Question

★★★★★

Karl Knapps, Inc., has received the following orders: Period 1 2 3 4 5 6 7 8 9 10 Order size 0 40 30 40 10 70 40 10 30 60 The entire fabrication for these units is scheduled on one machine. There are...

Answered: 1 week ago

Question

★★★★★

For the code shown below, the line numbers are added for your reference: 1 . def maybeAdd ( a , b ) : 2 . for let in b: 3 . if let not in a: 4 . a . append ( let ) After the code above runs, which of...

Answered: 1 week ago

Question

★★★★★

4. Describe the evolution of thinking regarding individual zone of optimal functioning theory. Which approach do you favor from an applied perspective and why?

Answered: 1 week ago

Question

★★★★★

Activity. In this problem, we calculate the pH of the intermediate form of a diprotic acid, taking activities into account. (a) Including activity coefficients, derive Equation 9-11 for potassium...

Answered: 1 week ago

Question

★★★★★

E10-10 At May 31, 2016, FedEx Corporation reported the following amounts (in millions) in its financial statements: Required: (1) Compute the debt-to-assets ratio and times interest earned ratio (to...

Answered: 1 week ago

Question

★★★★★

Gamer Co. has no debt. Its cost of capital is 10.4 percent. Suppose the company converts to a debtequity ratio of 1. The interest rate on the debt is 7.5 percent. Ignore taxes for this problem. What...

Answered: 1 week ago

Question

★★★★★

For our first paper you will have the option of writing about one of three topics: Abortion, Euthanasia, or the Death Penalty. Each topic will have two articles which you must fully read, understand,...

Answered: 1 week ago

Question

★★★★★

7. Greta has experienced severe major depressive disorder for the past 3 years. She calls her primary care physician complaining of stomach pains. Her primary care provider (PCP) explains that her...

Answered: 1 week ago

Question

★★★★★

An investor wants to purchase a stock that sells for $48. What is the value of a one-year put option to buy the stock at $45, if debt currently yields 8% and the standard deviation of stock returns...

Answered: 1 week ago

Question

★★★★★

Feminist Ethics examines various systems and institutions to look at ways that they have unfairly distributed power. Choose TWO out of the following institutions and give and explain a way that each...

Answered: 1 week ago

Question

★★★★★

The following expression gives what polynomial? Where I is the identity matrix, A is given as, 11-3 A = [ 03 118] ] |sl-A| = ? B = = [2] C = [10] D = [0]

Answered: 1 week ago

Question

★★★★★

3. Describe the three parts of an interview: opening, questions, and conclusion

Answered: 1 week ago

Question

★★★★★

1. Define the nature of interviews

Answered: 1 week ago

Question

★★★★★

2. Outline the different types of interviews

Answered: 1 week ago

Previous Question Next Question