Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 31, 2024

Problem 2 : Decision Tree, post - pruning and cost complexity parameter using sklearn 0 . 2 2 We will use a pre - processed

Problem

2

: Decision Tree, post

-

pruning and cost complexity parameter using sklearn

0.22

We will use a pre

-

processed natural language dataset in the CSV file "spamdata.csv

"

to classify emails as spam or not. Each row contains the word frequency for

54

words plus statistics on the longest "run" of captial letters. Word frequency is given by: fi

=

/

N Where fi is the frequency for word i

,

mi is the number of times word i appears in the email, and N is the total number of words in the email. We will use decision trees to classify the emails. TO DO

1

: Complete the function get

_

spam

_

dataset to read in values from the dataset and split the data into train and test sets. def get

_

spam

_

dataset

(

filepath

=

"data

/

spamdata

.

csv

",

test

_

split

= 0.1)

'''

get

_

spam

_

dataset Loads csv file located at "filepath". Shuffles the data and splits it so that the you have

(1 -

test

_

split

) * 100 %

training examples and

(

test

_

split

) * 100 %

testing examples. Args: filepath: location of the csv file test

_

split: percentage

/ 100

of the data should be the testing split Returns: X

_

train, X

_

test, y

_

train, y

_

test, feature

_

names

(

in that order

)

first four are np

.

ndarray

'''

# complete your code here return

0

TO DO

2

: Import the data set into five variables: X

_

train, X

_

test, y

_

train, y

_

test, label

_

names # Uncomment and edit the line below to complete this task. test

_

split

= 0.1

# default test

_

split; change it if you'd like; ensure that this variable is used as an argument to your function # your code here # X

_

train, X

_

test, y

_

train, y

_

test, label

_

names

=

.

arange

(5)

TO DO

3

: Build a decision tree classifier using the sklearn toolbox. Then compute metrics for performance like precision and recall. This is a binary classification problem, therefore we can label all points as either positive

(

SPAM

)

or negative

(

NOT SPAM

) .

def build

_

(

data

_

,

data

_

,

max

_

depth

=

None, max

_

leaf

_

nodes

=

None

)

'''

This function builds the decision tree classifier and fits it to the provided data. Arguments data

_

-

a np

.

ndarray data

_

-

.

ndarray max

_

depth

-

None if unrestricted, otherwise an integer for the maximum depth the tree can reach. Returns: A trained DecisionTreeClassifier

'''

# complete your code here

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Big Data Systems A 360-degree Approach

Authors: Jawwad ShamsiMuhammad Khojaye

1st Edition

0429531575, 9780429531576

Students also viewed these Databases questions

Question

★★★★★

The chapter suggested that because large increases in oil prices transfer income to countries that cannot rapidly increase their consumption or investment and therefore must save their windfalls,...

Answered: 1 week ago

Question

★★★★★

Soldner Health Care Products Inc. expects to maintain the same inventories at the end of 2010 as at the beginning of the year. The total of all production costs for the year is therefore assumed to...

Answered: 1 week ago

Question

★★★★★

Explain the limits of confidentiality for clients receiving services from a clinical psychologist.

Answered: 1 week ago

Question

★★★★★

The United States recently purchased $1 billion of 30-year zero-coupon bonds from a struggling foreign nation. The bonds yield 41/z% per year interest. The zero-coupon bonds pay no interest during...

Answered: 1 week ago

Question

★★★★★

1. No es importante que las personas ahorren para el retiro cuando pueden cobrar el Seguro Social. Verdadero o falso 2. Un gerente de teora X piensa que sus empleados son valiosos y motivados....

Answered: 1 week ago

Question

★★★★★

For Amazon years 2017, 2018, and 2019, calculate the cost of each capital component, after-tax cost of debt, cost of preferred, and cost of equity with the DCF method and CAPM method. What do you...

Answered: 1 week ago

Question

★★★★★

Shares in the Canadian maker of BlackBerry smart phones peaked in August 2007, at two hundred and thirty six dollars. In retrospect, the company was facing an inflection point and was completely...

Answered: 1 week ago

Question

★★★★★

5. (18 points) Solve the following integrals: (a) (4 points) | [+1 + (1.43) 50 + (r)] +=(1.43) - 50+ sec(x) dr. 2 (b) (4 points) (23 2w-cos(w)+ 3 w2 + 1 ) dw. (c) (5 points) [9t(1 + 1)- dt. (d) (5...

Answered: 1 week ago

Question

★★★★★

a) Give an example of a discrete random variable and report its domain. b) Give an example of a continuous random variable and report its domain. c) How many X values can your random variable example...

Answered: 1 week ago

Question

★★★★★

? Learn By Doing Adult male height (X) follows (approximately) a normal distribution with a mean of 69 inches and a standard deviation of 2.8 inches. a. What proportion of males are less than 65...

Answered: 1 week ago

Question

★★★★★

Under the Age Discrimination in Employment Act, how long must employers maintain specific information ( name , address, date of birth, occupation, rate of pay ) for each employee and applicant?

Answered: 1 week ago

Question

★★★★★

3.105 Tiger Funds Ltd. operates a number of mutual funds in high technology and in financial sectors. Hussein Roberts is a fund manager who runs a major fund that includes a wide variety of...

Answered: 1 week ago

Question

★★★★★

3.104 A consignment of 12 electronic components contains 1 component that is faulty. Two components are chosen randomly from this consignment for testing. a. How many different combinations of 2...

Answered: 1 week ago

Question

★★★★★

3.107 Subscriptions to a particular magazine are classified as gift, previous renewal, direct mail, and subscription service. In January 8% of expiring subscriptions were gifts; 41%, previous...

Answered: 1 week ago

Previous Question Next Question