Exercise 7.1 The aim of this exercise is to fill in the table of Figure 7.3 (page

Question:

Exercise 7.1 The aim of this exercise is to fill in the table of Figure 7.3 (page 295).

(a) Prove the optimal prediction for training data. To do this, find the minimum value of the absolute error, the sum-of-squares error, the entropy, and the value that gives the maximum likelihood. The maximum or minimum value is either an end point or where the derivative is zero.

(b) To determine the best prediction for the test data, assume that the data cases are generated stochastically according to some true parameter p0. Try the following for a number of different values for p0 ∈ [0, 1]. Generate k training examples (try various values for k, some small, say 5, and some large, say 1,000) by sampling with probability p0; from these generate n0 and n1. Generate a test set that contains many test cases using the same parameter p0.

For each of the optimality criteria – sum of absolute values, sum of squares, and likelihood (or entropy) – which of the following gives a lower error on the test set:

i) the mode ii) n1/(n0 + n1)

iii) if n1 = 0, use 0.001, if n0 = 0, use 0.999, else use n1/(n0 + n1). (Try this for different numbers when the counts are zero.)

iv) (n1 + 1)/(n0 + n1 + 2)

v) (n1 + α)/(n0 + n1 + 2α) for different values of α > 0 vi) another predictor that is a function of n0 and n1.

You may have to generate many different training sets for each parameter.

(For the mathematically sophisticated, can you prove what the optimal predictor is for each criterion?)

Fantastic news! We've Found the answer you've been seeking!