Consider the code for decision trees in Example 10.7 (page 472), and the Bayesian information criteria (BIC)

Question:

Consider the code for decision trees in Example 10.7 (page 472), and the Bayesian information criteria (BIC) (page 473) for decision trees. Consider the three cases: the BIC, the decision tree code with a 32-bit representation for probabilities, and the decision tree code that uses log2(|Es|) bits to represent a probability.

(a) For each case, how many extra bits does introducing a split incur?

(b) Which method has the biggest preference for smaller trees?

(c) For each of the three methods, is there a value of γ in the decision tree learner

(Figure 7.9 (page 284)) that corresponds to that method? If so, give it, if not, why not?

Fantastic news! We've Found the answer you've been seeking!