Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

2. [15 points] In this problem you'll see why simple feature-wise (i.e. coordinate-wise) splitting of the data isn't always the best approach to classification. Throughout

image text in transcribed
image text in transcribed
2. [15 points] In this problem you'll see why simple feature-wise (i.e. coordinate-wise) splitting of the data isn't always the best approach to classification. Throughout the problem, assume that each feature can be used for splitting the data multiple times in a decision tree. Suppose you are given n non-overlapping points in the unit square [0, 1] x [0, 1], each labeled either + or -. (a) Prove that there exists a decision tree of depth at most log, n that correctly labels all n points. At each node the decision tree should only perform a binary threshold split on a single coordinate. (Note that a binary tree of depth log, n can have as many as 2log, " = n internal nodes, i.e. splits.) (b) Describe (either mathematically, or in a few concise sentences) a set of n points in 0, 1] x [0, 1], along with corresponding + or - labels, so that the smallest decision tree that correctly labels them all has at least n - 1 splits. (Hint: if you can do it with n = 3, you can do it with arbitrary n) (c) Describe n points and corresponding labels that, as in part (b), can only be correctly labeled by a tree with at least n - 1 splits, with the additional condition that the points labeled + and the points labeled - must be separable by a straight line. In other words, there must exist a line segment splitting the unit square in two (not necessarily parallel to either axis), so that all points labeled + are in one part, and all points labeled - are in the other. (You will soon see classifiers that would have had a much easier time with this type of data.) 2 Maximum Likelihood Estimation, [25pt, Avi] Figure 2 shows a system S which takes two inputs T1, 12 (which are deterministic ) and outputs a linear combination of those two inputs, C +6242, introduces an additive error which is a random variable following some distribution. Thus the output y that you observe is given by equation 1. Assume that you have n > 2 instances ;-1..., or equivalently , where y = GT1 + 252 + (1) In other words having n equations in your hand is equivalent to having n equations of the following form: y; = city + Cat;2 + j, j = 1...n. The goal is to estimate c1, 2 from those 2

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

A Preface To Logic

Authors: Morris R Cohen

1st Edition

0486801853, 9780486801858

More Books

Students also viewed these Mathematics questions