Question
You have a decision tree algorithm and you are trying to figure out which attribute is the best to test on first. You are using
You have a decision tree algorithm and you are trying to figure out which attribute is the best to test on first. You are using the information gain metric.
You are given a set of 128 examples, with 64 positively labeled and 64 negatively labeled.
There are three attributes: Homeowner (H), In Debt (ID), and Rich (R).
For 64 examples, Home Owner is true. The Homeowner=true examples are 1/4 negative and 3/4 positive.
For 96 examples, In Debt is true. Of the In Debt=true examples, 1/2 are positive and half are negative.
For 32 examples, Rich is true. 3/4 of the Rich=true examples are positive and 1/4 are negative
You must show all mathematical calculations/steps to get full points for each subpart (a) (d) below. Just writing the final answer in each subpart (correct or not) will get zero points.
a)What is the entropy of the initial set of examples?
b) What is the information gain of splitting on the Home Owner attribute as the root node?
c)What is the information gain of splitting on the In Debt attribute as the root node?
d) What is the information gain of splitting on the Rich attribute as the root node?
e) Which attribute do you split on?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started