Question
Your task for this question is to build a spam classier using the UCR email spma dataset https://archive.ics.uci.edu/ml/datasets/Spambase came from the postmaster and individuals who
Your task for this question is to build a spam classier using the UCR email spma dataset
https://archive.ics.uci.edu/ml/datasets/Spambase came from the postmaster and
individuals who had led spam. Please download the data from that website. The collec-
tion of non-spam e-mails came from led work and personal e-mails, and hence the word
'george' and the area code '650' are indicators of non-spam. These are useful when con-
structing a personalized spam lter. You are free to choose any package and any language
to choose for this homework.
One would either have to blind such non-spam indicators or get a very wide collection of
non-spam to generate a general purpose spam lter. Load the data. You will see there are
total of 4601 instances, and 57 features. Note: there may be some missing values, you can
just ll in zero.
(a) Build a classication tree model (also known as the CART model). In
our answer, you should report the tree models tted similar to what is shown in the
\Random forest" lecture, the tree plot.
1
(b) Also build a random forrest model. Recall that in random forest, the
decision tree is grown on a bootstrapped dataset, constructed by selecting p of
the input variables at random as candidates for splitting. Comment on what is a
rule-of-thumb to choose here.
(c) Now partition the data to use the rst 80% for training and the remaining
20% for testing. Your task is to compare and report the test error for your classi-
cation tree and random forest models on testing data, respectively. Plot the curve
of test (OOB) error versus the number of trees used in random forest, similar to our
lecture.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started