Answered step by step
Verified Expert Solution
Question
1 Approved Answer
( 1 6 marks ) In this question, you will predict Orange Juice sales with the OJ dataset ? 1 . Each line in this
marks In this question, you will predict Orange Juice sales with the OJ dataset Each line in this dataset is a purchase of either a Minute Maid MM or Citrus Hill CH brand orange juice collected in five different stores over some period of time We will now fit various treebased classification algorithms to predict Purchase ie CH or MM from the remaining columns. a Do some exploratory data analysis to get a feeling for the dataset. Answer the following questions: Which brand tends to be more expensive? Is one brand bought more often than the other one? b Randomly split the dataset into a training and a test dataset with and relative size. In all following tasks, train on the training set. We will use the test set in the last step only. c Using the tree package, fit a single classification tree and use fold crossvalidation to prune it to optimal size. Visualise the tree and record its misclassification error on the training set. Since trees allow relatively easy interpretation, give a oneor twosentence insight in what can be learned from the tree about the structure of the data. d Using the randomForest package, use bagging of trees to predict Purchase. Record the misclassification rate obtained. e Same as the previous task, but use a random forest instead of bagging. f Same as the previous task, but use boosting via the gbm package. You will need to specify the option distribution "bernoulli" because we are considering a binary classification problem. You will also need to encode Purchase as a variable in order to be able to apply gbm You can do this by creating a new feature Purchase and removing the original feature Purchase afterwards. g Compare all methods so far by predicting them on the test set, and computing the misclassification rate. Which method performs best?
marks In this question, you will predict Orange Juice sales with the OJ dataset
Each line in this dataset is a purchase of either a Minute Maid MM or Citrus Hill CH
brand orange juice collected in five different stores over some period of time We will
now fit various treebased classification algorithms to predict Purchase ie CH or MM
from the remaining columns.
a Do some exploratory data analysis to get a feeling for the dataset. Answer the
following questions:
Which brand tends to be more expensive?
Is one brand bought more often than the other one?
b Randomly split the dataset into a training and a test dataset with and
relative size. In all following tasks, train on the training set. We will use the test
set in the last step only.
c Using the tree package, fit a single classification tree and use fold crossvalidation
to prune it to optimal size. Visualise the tree and record its misclassification error
on the training set. Since trees allow relatively easy interpretation, give a oneor
twosentence insight in what can be learned from the tree about the structure of
the data.
d Using the randomForest package, use bagging of trees to predict Purchase.
Record the misclassification rate obtained.
e Same as the previous task, but use a random forest instead of bagging.
f Same as the previous task, but use boosting via the gbm package. You will need
to specify the option distribution "bernoulli" because we are considering
a binary classification problem. You will also need to encode Purchase as a
variable in order to be able to apply gbm You can do this by creating a new
feature Purchase and removing the original feature Purchase afterwards.
g Compare all methods so far by predicting them on the test set, and computing the
misclassification rate. Which method performs best?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started