Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

( 1 6 marks ) In this question, you will predict Orange Juice sales with the OJ dataset ? 1 . Each line in this

(16 marks) In this question, you will predict Orange Juice sales with the OJ dataset ?1.
Each line in this dataset is a purchase of either a Minute Maid (MM) or Citrus Hill (CH)
brand orange juice collected in five different stores over some period of time ?2. We will
now fit various tree-based classification algorithms to predict Purchase (i.e., CH or MM)
from the remaining columns.
(a) Do some exploratory data analysis to get a feeling for the dataset. Answer the
following questions:
Which brand tends to be more expensive?
Is one brand bought more often than the other one?
(b) Randomly split the dataset into a training and a test dataset with 80% and 20%
relative size. In all following tasks, train on the training set. We will use the test
set in the last step only.
(c) Using the tree package, fit a single classification tree and use 10-fold cross-validation
to prune it to optimal size. Visualise the tree and record its misclassification error
on the training set. Since trees allow relatively easy interpretation, give a one-or-
two-sentence insight in what can be learned from the tree about the structure of
the data.
(d) Using the randomForest package, use bagging of 500 trees to predict Purchase.
Record the misclassification rate obtained.
(e) Same as the previous task, but use a random forest instead of bagging.
(f) Same as the previous task, but use boosting via the gbm package. You will need
to specify the option distribution = "bernoulli" because we are considering
a binary classification problem. You will also need to encode Purchase as a 0-
1-variable in order to be able to apply gbm. You can do this by creating a new
feature Purchase_01 and removing the original feature Purchase afterwards.
(g) Compare all methods so far by predicting them on the test set, and computing the
misclassification rate. Which method performs best?
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions