14.4 Kernel stability. Suppose an approximation of the kernel matrix K, denoted K0, is used to train

Question:

14.4 Kernel stability. Suppose an approximation of the kernel matrix K, denoted K0, is used to train the hypothesis h0 (and let h denote the non-approximate hypothesis).

At test time, no approximation is made, so if we let kx =

K(x; x1); : : : ;

K(x; xm)

> we can write h(x) = >kx and h0(x) = 0>kx. Show that if 8x; x0 2 X;K(x; x0) r then jh0(x) ???? h(x)j

rmM

2 kK0 ????Kk2 :

(Hint: Use exercise 10.3)

14.5 Stability of relative-entropy regularization.

(a) Consider an algorithm that selects a distribution g over a hypothesis class which is parameterized by 2 . Given a point z = (x; y) the expected loss is dened as H(g; z) =

L(h(x); y)g() d ;

with respect to a base loss function L. Assuming the loss function L is bounded by M, show that the expected loss H is M-admissible, i.e. show jH(g; z) ???? H(g0; z)j M R

jg() ???? g0()j d.

(b) Consider an algorithm that minimizes the entropy regularized objective over the choice of distribution g:

FS(g) =

1 m

Xm i=1 H(g; zi)

| {z }

bR S(g)

+K(g; f0) :

Here, K is the Kullback-Leibler divergence (or relative entropy) between two distributions, K(g; f0) =

g() log g()

f0()

d ; (14.14)

and f0 is some xed distribution. Show that such an algorithm is stable by performing the following steps:

i. First use the fact 1 2 (

jg()????g0()j d)2 K(g; g0) (Pinsker's inequality), to show

jgS() ???? gS0 ()j d

BK(:;f0)(gkg0) + BK(:;f0)(g0kg) :

ii. Next, let g be the minimizer of FS and g0 the minimizer of FS0 , where S and S0 dier only at the index m. Show that BK(:;f0)(gkg0) + BK(:;f0)(g0kg)

1 m
H(g0; zm) ???? H(g; zm) + H(g; z0m) ???? H(g0; z0m )

2M m
Z jg() ???? g0()j d :
iii. Finally, combine the results above to show that the entropy regularized algorithm is 2M2 m -stable.

Fantastic news! We've Found the answer you've been seeking!