11.9 Leave-one-out error. In general, the computation of the leave-one-out error can be very costly since, for...

Question:

11.9 Leave-one-out error. In general, the computation of the leave-one-out error can be very costly since, for a sample of size m, it requires training the algorithm m times. The objective of this problem is to show that, remarkably, in the case of kernel ridge regression, the leave-one-out error can be computed eciently by training the algorithm only once.
Let S = ((x1; y1); : : : ; (xm; ym)) denote a training sample of size m and for any i 2 [m], let Si denote the sample of size m ???? 1 obtained from S by removing (xi; yi): Si = S ???? f(xi; yi)g. For any sample T, let hT denote a hypothesis obtained by training T. By denition (see denition 5.2), for the squared loss, the leave-one-out error with respect to S is dened by bR LOO(KRR) = 1 m Xm i=1 (hSi (xi) ???? yi)2 :

(a) Let S0i = ((x1; y1); : : : ; (xi; hSi (yi)); : : : ; (xm; ym)). Show that hSi = hS0i .

(b) Dene yi = y ???? yiei + hSi (xi)ei, that is the vector of labels with the ith component replaced with hSi (xi). Prove that for KRR hSi (xi) = y>i (K +
I)????1Kei.

(c) Prove that the leave-one-out error admits the following simple expression in terms of hS:
bR LOO(KRR) = 1 m Xm i=1
hS(xi) ???? yi e>i (K+ I)????1Kei 2 : (11.34)

(d) Suppose that the diagonal entries of matrix M = (K+I)????1K are all equal to . How do the empirical error bR S of the algorithm and the leave-one-out error bR LOO relate? Is there any value of for which the two errors coincide?

Fantastic news! We've Found the answer you've been seeking!