Consider the code for decision trees in Example 10.7 (page 472), and the Bayesian information criteria (BIC)

Question:

Consider the code for decision trees in Example 10.7 (page 472), and the Bayesian information criteria (BIC) (page 473) for decision trees. Consider the three cases: the BIC, the decision tree code with a 32-bit representation for probabilities, and the decision tree code that uses log2(|Es|) bits to represent a probability.

(a) For each case, how many extra bits does introducing a split incur?

(b) Which method has the biggest preference for smaller trees?

(c) For each of the three methods, is there a value of γ in the decision tree learner

(Figure 7.9 (page 284)) that corresponds to that method? If so, give it, if not, why not?

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: