3. The table below lists a sample of data from a census There are four descriptive features...
Question:
3. The table below lists a sample of data from a census
There are four descriptive features and one target feature in this dataset:
AGE, a continuous feature listing the age of the individual EDUCATION, a categorical feature listing the highest education award achieved by the individual (high school, bachelors, doctorate)
MARITAL STATUS (never married, married, divorced)
OCCUPATION (transport = works in the transportation industry;
professional = doctors, lawyers, etc.; agriculture = works in the agricultural industry; armed forces = is a member of the armed forces)
ANNUAL INCOME, the target feature with 3 levels (50K)
a. Calculate the entropy for this dataset.
b. Calculate the Gini index for this dataset.
c. When building a decision tree, the easiest way to handle a continuous feature is to define a threshold around which splits will be made. What would be the optimal threshold to split the continuous AGE feature (use information gain based on entropy as the feature selection measure)?
d. Calculate information gain (based on entropy) for the EDUCATION, MARITAL STATUS, and OCCUPATION features.
e. Calculate the information gain ratio (based on entropy) for EDUCATION, MARITAL STATUS, and OCCUPATION features.
f. Calculate information gain using the Gini index for the EDUCATION, MARITAL STATUS, and OCCUPATION features.
Step by Step Answer:
Fundamentals Of Machine Learning For Predictive Data Analytics Algorithms Worked Examples And Case Studies
ISBN: 9780262029445
1st Edition
Authors: John D. Kelleher, Brian Mac Namee, Aoife D'Arcy