Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Your task for this question is to build a spam classier using the UCR email spma dataset https://archive.ics.uci.edu/ml/datasets/Spambase came from the postmaster and individuals who

Your task for this question is to build a spam classier using the UCR email spma dataset

https://archive.ics.uci.edu/ml/datasets/Spambase came from the postmaster and

individuals who had led spam. Please download the data from that website. The collec-

tion of non-spam e-mails came from led work and personal e-mails, and hence the word

'george' and the area code '650' are indicators of non-spam. These are useful when con-

structing a personalized spam lter. You are free to choose any package and any language

to choose for this homework.

One would either have to blind such non-spam indicators or get a very wide collection of

non-spam to generate a general purpose spam lter. Load the data. You will see there are

total of 4601 instances, and 57 features. Note: there may be some missing values, you can

just ll in zero.

(a) Build a classication tree model (also known as the CART model). In

our answer, you should report the tree models tted similar to what is shown in the

\Random forest" lecture, the tree plot.

1

(b) Also build a random forrest model. Recall that in random forest, the

decision tree is grown on a bootstrapped dataset, constructed by selecting p of

the input variables at random as candidates for splitting. Comment on what is a

rule-of-thumb to choose here.

(c) Now partition the data to use the rst 80% for training and the remaining

20% for testing. Your task is to compare and report the test error for your classi-

cation tree and random forest models on testing data, respectively. Plot the curve

of test (OOB) error versus the number of trees used in random forest, similar to our

lecture.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Calculus An Applied Approach

Authors: Ron Larson, Dennis J McKenzie, Larson/Edwards, Bruce H Edwards

7th Edition

1111809720, 9781111809720

More Books

Students also viewed these Mathematics questions

Question

Determine the amplitude and period of each function.

Answered: 1 week ago