Consider minimizing Equation (7.1) (page 289), which gives the error of a linear prediction. This can be

Question:

Consider minimizing Equation (7.1) (page 289), which gives the error of a linear prediction. This can be solved by finding the zero(s) of its derivative. The general case involves solving linear equations, for which there are many techniques, but it is instructive to do a simple case by hand.

(a) Give a formula for the weights that minimize the error for the case where n = 2 (i.e., when there are only two input features). [Hint: For each weight, differentiate with respect to that weight and set to zero. Solve the resulting equations.]

(b) Write pseudocode to implement this.

(c) Why is it hard to minimize the error analytically when using a sigmoid function as an activation function, for n = 2? (Why doesn’t the same method as in part

(a) work?)

Fantastic news! We've Found the answer you've been seeking!