14.4 Kernel stability. Suppose an approximation of the kernel matrix K, denoted K0, is used to train
Question:
14.4 Kernel stability. Suppose an approximation of the kernel matrix K, denoted K0, is used to train the hypothesis h0 (and let h denote the non-approximate hypothesis).
At test time, no approximation is made, so if we let kx =
K(x; x1); : : : ;
K(x; xm)
> we can write h(x) = >kx and h0(x) = 0>kx. Show that if 8x; x0 2 X;K(x; x0) r then jh0(x) ???? h(x)j
rmM
2 kK0 ????Kk2 :
(Hint: Use exercise 10.3)
14.5 Stability of relative-entropy regularization.
(a) Consider an algorithm that selects a distribution g over a hypothesis class which is parameterized by 2 . Given a point z = (x; y) the expected loss is dened as H(g; z) =
Z
L(h(x); y)g() d ;
with respect to a base loss function L. Assuming the loss function L is bounded by M, show that the expected loss H is M-admissible, i.e. show jH(g; z) ???? H(g0; z)j M R
jg() ???? g0()j d.
(b) Consider an algorithm that minimizes the entropy regularized objective over the choice of distribution g:
FS(g) =
1 m
Xm i=1 H(g; zi)
| {z }
bR S(g)
+K(g; f0) :
Here, K is the Kullback-Leibler divergence (or relative entropy) between two distributions, K(g; f0) =
Z
g() log g()
f0()
d ; (14.14)
and f0 is some xed distribution. Show that such an algorithm is stable by performing the following steps:
i. First use the fact 1 2 (
R
jg()????g0()j d)2 K(g; g0) (Pinsker's inequality), to show
Z
jgS() ???? gS0 ()j d
2
BK(:;f0)(gkg0) + BK(:;f0)(g0kg) :
ii. Next, let g be the minimizer of FS and g0 the minimizer of FS0 , where S and S0 dier only at the index m. Show that BK(:;f0)(gkg0) + BK(:;f0)(g0kg)
1 m
H(g0; zm) ???? H(g; zm) + H(g; z0m) ???? H(g0; z0m )
2M m
Z jg() ???? g0()j d :
iii. Finally, combine the results above to show that the entropy regularized algorithm is 2M2 m -stable.
Step by Step Answer:
Foundations Of Machine Learning
ISBN: 9780262351362
2nd Edition
Authors: Mehryar Mohri, Afshin Rostamizadeh