Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jun 07, 2022

1. Decision trees As part of this question you will implement and compare the Information Gain, Gini Index and CART evaluation measures for splits

1. Decision trees As part of this question you will implement and compare the Information Gain, Gini Index and CART evaluation measures for splits in decision tree construction.Let D = (X, y), |D|= n be a dataset with n samples. The entropy of the dataset is defined as 2 H(D)=P(c|D)log2 P(ci|D), i=1 where P(c|D) is the fraction of samples in class i. A split on an attribute of the form X, c partitions the dataset into two subsets Dy and DN based on whether samples satisfy the split predicate or not respectively. The split Entropy is the weighted average Entropy of the resulting datasets Dy and Dx: ny H(Dy, DN) = -H(Dy)+ H(DN), n nN n where ny are the number of samples in Dy and ny are the number of samples in DN. The Information Gain (IG) of a split is defined as the the difference of the Entropy and the split entropy: IG(D, Dy, DN) = H(D) - H(Dy, DN). The higher the information gain the better. The Gini index of a data set is defined as G(D) = 1-- P(c;D) and the Gini index of a split is defined as the weighted average of the Gini indices of the resulting partitions: nN G(DY, DN) = G(DY) + TYG(DY) + N G(DN). n The lower the Gini index the better. Finally, the CART measure of a split is defined as: (1) 2 CART (DY, DN) = 2Y NP(ci Dy) - P(c|DN)|. n n i=1 (2) (3) The higher the CART the better. You will need to fill in the implementation of the three measures in the provided Python code as part of the homework. Note: You are not allowed to use existing implementations of the measures. The homework includes two data files, train.txt and test.txt. The first consists of 100 observations to use to train your classifiers; the second has 10 to test. Each file is comma-separated, and each row contains 11 values - the first 10 are attributes (a mix of numeric and categorical translated to numeric, e.g. {T,F} = {0,1}), and the final being the true class of that observation. You will need to separate attributes and class in your load(filename) function. (a) [10 pts.] Implement the IG(D, inder, value) function according to equation 1, where D is a dataset, index is the index of an attribute and value is the split value such that the split is of the form X, value. The function should return the value of the information gain. (b) [10 pts.] Implement the G(D, index, value) function according to equation 2, where D is a dataset, inder is the index of an attribute and value is the split value such that the split is of the form X < value. The function should return the value of the gini index value.

Step by Step Solution

★★★★★

3.49 Rating (152 Votes )

There are 3 Steps involved in it

Step: 1

importing libraries import numpy as nm import matplotlibpyplot as mtp import pandas as pd importing ... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advertising & IMC Principles & Practice

Authors: Sandra Moriarty, Nancy Mitchell, William Wells

9th Edition

9780132998208, 0132163640, 132998203, 978-0132163644

More Books

Students also viewed these Mathematics questions

Question

★★★★★

please solve each part of this question using excel Simulate a six-sided dice for 100 trials a. Count the number of 1's,2's,3's,4's,5's and 6's and plot as a histogram, does this make sense b....

Answered: 1 week ago

Question

★★★★★

As part of this activity, you will consider the concept of "how," as defined by Jeremy Moon. Moon (2017) examines the significance of "how" businesses develop a long-term strategy in their research....

Answered: 1 week ago

Question

★★★★★

You will implement and test a small class called statistician, which is similar to some of the small classes in Chapter 2 of the text. iman Purposes: Ensure that you can write a small class that...

Answered: 1 week ago

Question

★★★★★

Under California Law, does a client Carmen Leake have an affirmative defense to inability to pay child support where a parent's inability to pay is due to unwillingness to work, if that parent is...

Answered: 1 week ago

Question

★★★★★

Elements in the same family often form oxyanions of the same general formula. The anions are named in a similar fashion. What are the names of the oxyanions of selenium and tellurium: SeO2-, SeO32-,...

Answered: 1 week ago

Question

★★★★★

=+1. A cafeteria plan a. provides on-site lunches as a fringe benefit to employees b. sets the same fringe benefits for all employees in a company c. pays a higher salary instead of fringe benefits...

Answered: 1 week ago

Question

★★★★★

your ultimate goal upon graduation (i.e., career goals).

Answered: 1 week ago

Question

★★★★★

Calculating Payoffs Use the option quote information shown here to answer the questions that follow. The stock is currently selling for $114. a. Suppose you buy 10 contracts of the February 110 call...

Answered: 1 week ago

Question

★★★★★

20.00 points XS Supply Company is developing its annual financial statements at Decamber 31. The statements are complete exoept for the statement of cash flows. The completed comparative balance...

Answered: 1 week ago

Question

★★★★★

write a title Understanding the roots of modern educational practices can provide valuable insights into their effectiveness and potential for improvement. One such root influencing contemporary...

Answered: 1 week ago

Question

★★★★★

aded ime Taken 0 35 27 Ava golke Attempt 1 Use the graph of the function f plotted with a solid line to sketch the graph of the given function g g x f x 2 1 M y f x

Answered: 1 week ago

Question

★★★★★

Create a function (could be piecewise) that demonstrates the following discontinuities. (a) jump discontinuity (b) point discontinuity (c) infinite discontinuity Prove that f(x) is continuous at x =...

Answered: 1 week ago

Question

★★★★★

e) Alice wishes to sign a document and send it to Bob using RSA signatures. Suppose she generates two primes p = 7, q = 11, and a public exponent e =43, and publishes (n. e) as her public key (6...

Answered: 1 week ago

Question

★★★★★

yellow boxes NAME Suppose you administer a certain aptitude test to a random sample of 9 students in your school, and that the average A new sneaker claims that it can make male athletes jump higher....

Answered: 1 week ago

Question

★★★★★

You have a firm that starts out with $70,000 in cash in the bank. You have three investment opportunities. You can undertake any or all of these investment opportunities, but you may only invest in...

Answered: 1 week ago

Question

★★★★★

Base on the diagram 2a, what is the MySQL version? Wireshark - Follow HTTP Stream (tcp.stream eq 4).SQL_Lab.pcap ID: 1 or 1-1 union select null, version ()# First name: admin Surname: admin ID: 1' or...

Answered: 1 week ago

Question

★★★★★

ANALYZING, DESCRIBING, AND REDESIGNING JOBS Part 1: Conduct a Job Analysis Choose a friend or family member with whom you feel comfortable. Develop 1020 questions for a job analysis interview....

Answered: 1 week ago

Question

★★★★★

Could the owner of a business prepare a statement of financial position on 9 December or 23 June or today?

Answered: 1 week ago

Question

★★★★★

Why is it important that advertising be regulated as a business?

Answered: 1 week ago

Question

★★★★★

You have been asked to develop a media plan for a new reality show that you have created. Focus on the Internet as a primary medium for this launch. Go to both www.google.com/adsense. Indicate how...

Answered: 1 week ago

Question

★★★★★

Describe the various copy elements of a print ad.

Answered: 1 week ago

Question

★★★★★

Calculate the missing values for the promissory notes described

Answered: 1 week ago

Question

★★★★★

Calculate the maturity value of a 120-day, $1000 face value note dated November 30, 2011, and earning interest at 10.75%.

Answered: 1 week ago

Question

★★★★★

A 90-day non-interest-bearing note for $3300 is dated August 1. What would be a fair selling price of the note on September 1 if money can earn 7.75%?

Answered: 1 week ago

Previous Question Next Question