Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Answer the questions (e) and (f) only using Python code. 1. Decision trees As part of this question you will implement and compare the Information

Answer the questions (e) and (f) only using Python code.

image text in transcribed

image text in transcribed

1. Decision trees As part of this question you will implement and compare the Information Gain, Gini Index and CART evaluation measures for splits in decision tree construction.Let D-(X,y),IDI-n be a dataset with n samples. The entropy of the dataset is defined as H (D) = i=1 where P(lD) is the fraction of samples in class i. A split on an attribute of the form X, c partitions the dataset into two subsets Dy and DN based on whether samples satisfy the split predicate or not respectively. The split Entropy is the weighted average Entropy of the resulting datasets Dy and Dx LN where ny are the number of samples in Dy and nN are the number of samples in DN. The Information Gain (IG) of a split is defined as the the difference of the Entropy and the split entropy: The higher the information gain the better The Gini index of a data set is defined as G(D)-1-2-1 PcID)2 and the Gini index of a split is defined as the weighted average of the Gini indices of the resulting partitions: LN The lower the Gini index the better Finally, the CART measure of a split is defined as: CART(Dy,D n The higher the CART the better You will need to fill in the implementation of the three measures in the provided Python code as part of the homework. Note: You are not allowed to use existing implementations of the measures. The homework includes two data files, train.trt and test.trt. The first consists of 100 observations to use to train your classifiers; the second has 10 to test. Each file is comma-separated, and each row contains 11 values the first 10 are attributes (a mix of numeric and categorical translated to numeric, e.g. T,F 0,1]), and the final being the true class of that observation. You wl need to separate attributes and class in your load(filename) function. 1. Decision trees As part of this question you will implement and compare the Information Gain, Gini Index and CART evaluation measures for splits in decision tree construction.Let D-(X,y),IDI-n be a dataset with n samples. The entropy of the dataset is defined as H (D) = i=1 where P(lD) is the fraction of samples in class i. A split on an attribute of the form X, c partitions the dataset into two subsets Dy and DN based on whether samples satisfy the split predicate or not respectively. The split Entropy is the weighted average Entropy of the resulting datasets Dy and Dx LN where ny are the number of samples in Dy and nN are the number of samples in DN. The Information Gain (IG) of a split is defined as the the difference of the Entropy and the split entropy: The higher the information gain the better The Gini index of a data set is defined as G(D)-1-2-1 PcID)2 and the Gini index of a split is defined as the weighted average of the Gini indices of the resulting partitions: LN The lower the Gini index the better Finally, the CART measure of a split is defined as: CART(Dy,D n The higher the CART the better You will need to fill in the implementation of the three measures in the provided Python code as part of the homework. Note: You are not allowed to use existing implementations of the measures. The homework includes two data files, train.trt and test.trt. The first consists of 100 observations to use to train your classifiers; the second has 10 to test. Each file is comma-separated, and each row contains 11 values the first 10 are attributes (a mix of numeric and categorical translated to numeric, e.g. T,F 0,1]), and the final being the true class of that observation. You wl need to separate attributes and class in your load(filename) function

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Oracle Databases On The Web Learn To Create Web Pages That Interface With Database Engines

Authors: Robert Papaj, Donald Burleson

11th Edition

1576100995, 978-1576100998

More Books

Students also viewed these Databases questions

Question

2. What type of team would you recommend?

Answered: 1 week ago