2. [15 points] In this problem you'll see why simple feature-wise (i.e. coordinate-wise) splitting of the data isn't always the best approach to classification. Throughout the problem, assume that each feature can be used for splitting the data multiple times in a decision tree. Suppose you are given n non-overlapping points in the unit square [0, 1] x [0, 1], each labeled either + or -. (a) Prove that there exists a decision tree of depth at most log, n that correctly labels all n points. At each node the decision tree should only perform a binary threshold split on a single coordinate. (Note that a binary tree of depth log, n can have as many as 2log, " = n internal nodes, i.e. splits.) (b) Describe (either mathematically, or in a few concise sentences) a set of n points in 0, 1] x [0, 1], along with corresponding + or - labels, so that the smallest decision tree that correctly labels them all has at least n - 1 splits. (Hint: if you can do it with n = 3, you can do it with arbitrary n) (c) Describe n points and corresponding labels that, as in part (b), can only be correctly labeled by a tree with at least n - 1 splits, with the additional condition that the points labeled + and the points labeled - must be separable by a straight line. In other words, there must exist a line segment splitting the unit square in two (not necessarily parallel to either axis), so that all points labeled + are in one part, and all points labeled - are in the other. (You will soon see classifiers that would have had a much easier time with this type of data.) 2 Maximum Likelihood Estimation, [25pt, Avi] Figure 2 shows a system S which takes two inputs T1, 12 (which are deterministic ) and outputs a linear combination of those two inputs, C +6242, introduces an additive error which is a random variable following some distribution. Thus the output y that you observe is given by equation 1. Assume that you have n > 2 instances ;-1..., or equivalently , where y = GT1 + 252 + (1) In other words having n equations in your hand is equivalent to having n equations of the following form: y; = city + Cat;2 + j, j = 1...n. The goal is to estimate c1, 2 from those 2