Suppose you need to define a system that, given data about a persons TV-watching likes, recommends other

Question:

Suppose you need to define a system that, given data about a person’s TV-watching likes, recommends other TV shows the person may like. Each show has features specifying whether it is a comedy, whether it features doctors, whether it features lawyers, and whether it has guns. You are given the fictitious examples of Figure 7.23 about whether the person likes various TV shows. We want to use this dataset to learn the value of Likes (i.e., to predict which TV shows the person would like based on the attributes of the TV show).

This is designed to be small enough to do it manually, however you may find the AIPython (aipython.org) code or useful to check your answers.

(a) Suppose the error is the sum of absolute errors. Give the optimal decision tree with only one node (i.e., with no splits). What is the error of this tree?

(b) Do the same as in part (a), but with the squared error.

(c) Suppose the error is the sum of absolute errors. Give the optimal decision tree of depth 2 (i.e., the root node is the only node with children). For each leaf in the tree, give the examples that are filtered to that node. What is the error of this tree?

(d) Do the same as in part

(c) but with the squared error.

(e) What is the smallest tree that correctly classifies all training examples? Does a top-down decision tree that optimizes the information gain at each step represent the same function?

(f) Give two instances not appearing in the examples of Figure 7.23 and show how they are classified using the smallest decision tree. Use this to explain the bias inherent in the tree. (How does the bias give you these particular predictions?)

(g) Is this dataset linearly separable? Explain why or why not.

Fantastic news! We've Found the answer you've been seeking!