Question
There exist several methods to measure the impurity of a set of labelled observations, e.g. D = {(x n , y n )} N n=1
There exist several methods to measure the impurity of a set of labelled observations, e.g. D = {(xn, yn)}Nn=1, by looking at the object labels, yn. Two examples of impurity measures are the Gini index and the entropy.
(a) Consider the case where yn {0, 1} and xn {i)10i=1, for all n = 1,...,N and the task of predicting the label, y, given the object attribute x. Is this a regression problem, a binary classification problem, or a multi-class classification problem? Please justify the answer.
(b) Describe the process of growing a classification tree using the Gini index.
(c) Consider the following set of data points D = {(1,0), (3, 1), (4, 1), (4, 0), (2, 0), (1, 0), (1, 0), (7, 1), (10, 1), (2, 1)}. where, for each observation, (xn, yn) D(n = 1,..., 10), xn is the object attribute and yn the object label. What is the Gini index, G(D), and the entropy, S(D), of D?
(d) Which of the following thresholds =2, =3, = 6, is preferable for splitting D from a classification perspective? Choose the threshold that minimises
Gsplit(, D) = |R1()| G(R1()) + |R2()| G(R2()),
where R1() = {(x, y) D | x=}, R2() = {(x, y) D | x > }, and G(A) is the Gini index of the set A.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started