Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

procedure with 3 0 replicntions is implemented for tuning. How many classifiens need to be trained in total? Justify your nnswer. ( d ) Consider

procedure with 30 replicntions is implemented for tuning. How many classifiens
need to be trained in total? Justify your nnswer.
(d) Consider two competing classifers, a logistic regression classifier and a random
farest classifier, both trained and evaluated on the same dnta splits (training
and validation). Is it ressonable to expect that the random forest classifier
will always outperform on average the logistic regression clnsifier? Justify
your nnswer.
(e) Four hundred labeled snmples are used to train two clnsifiers M1 and M2. For
classifier M1, the dataset is divided into training and validation sets of 200
samples each and the classifer is trained on the training set. The performance
of M1 ou this validation set provides a 95% nocuracy. For classifier M2, the
dataset is divided into a training set of 350 samples and a validation set of
50 samples, and the classifier is trained on the training set. Tbe performance
of M2 on the corresponding validation set provides an accurncy of 95%. Is
it appeoprinte to consider classifier M2 as having an equivalent predictive
performance relative to the predictive performance of classifier M1? Justify
your nnswer.Provide your answer and a coacise explanation for each of the following questivas.
(a) Using k-menns, a statisticinn wants to investignte the presenee of clusters in
a datnset with 2722 ohservations.
A clustering of the data points with K=4 was found for this data, with a
total sum of squares of 457293 and cluster-specific within sums of squares of
6955,11329,10298, and 11411 respectively.
Another clustering of the observations with K=3 wns found far the same
dataset, with a total sum of squares of 457293 nnd cluster-specific within sums
of squares of 12123,7205, and 13027 respectively.
Compare the two clustering solutions using an appropriate index. Which one
is preferred? Justify your answer.
(b) A clnssification tree algorithm is applied to a given data set with a target
binary variable y and a numerical input variable x. Consider the two following
splits Split A nnd Split B reported in the two tables below.
(a) Splat B
(b) Sple A
On the basis of the Gini messure, which split would be chosen by a classifies-
tion tree algorithm? Justify your nnswer.
(c) A SVM clnssifier with GRBF kernel, a classificntion tree classifer, and a ka-
gistic regression model are employed to deploy a system for malware detection
based on numerical features extracted from webpages. The SVM is tumed
cousidering a grid of hyperparameter values constructed using the get of cost
values C={50,100,200,500} and the set of values ={0.01,0.1,0.2,0.5}.
The classification tree is tuned using a set of complexity parameters cp=
{0.1,0.05,0.01}. No tuning is performed for legistic regression, which is im-
plemented rsing all the input varisbles available. A 10-fold cross-valichation
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Informix Database Administrators Survival Guide

Authors: Joe Lumbley

1st Edition

0131243144, 978-0131243149

More Books

Students also viewed these Databases questions

Question

What steps should be taken to address any undesirable phenomena?

Answered: 1 week ago