3.31 Generalization bound based on covering numbers. Let H be a family of functions mapping X to...

Question:

3.31 Generalization bound based on covering numbers. Let H be a family of functions mapping X to a subset of real numbers Y  R. For any  > 0, the covering number N(H; ) of H for the L1 norm is the minimal k 2 N such that H can be covered with k balls of radius , that is, there exists fh1; : : : ; hkg  H such that, for all h 2 H, there exists i  k with kh ???? hik1 = maxx2X jh(x) ???? hi(x)j  .

In particular, when H is a compact set, a nite covering can be extracted from a covering of H with balls of radius  and thus N(H; ) is nite.

Covering numbers provide a measure of the complexity of a class of functions:

the larger the covering number, the richer is the family of functions. The objective of this problem is to illustrate this by proving a learning bound in the case of the squared loss. Let D denote a distribution over X  Y according to which labeled examples are drawn. Then, the generalization error of h 2 H for the squared loss is de ned by R(h) = E(x;y)D[(h(x)????y)2] and its empirical error for a labeled sample S = ((x1; y1); : : : ; (xm; ym)) by bR S(h) = 1 m Pm i=1(h(xi)????yi)2.
We will assume that H is bounded, that is there exists M > 0 such that jh(x) ???? yj  M for all (x; y) 2 X  Y. The following is the generalization bound proven in this problem:
P SDm h sup h2HjR(h) ???? bR S(h)j  
i  N 
H;

8M 
2 exp 
????m2 2M4 
: (3.55)
The proof is based on the following steps.

(a) Let LS = R(h) ???? bR S(h), then show that for all h1; h2 2 H and any labeled sample S, the following inequality holds:
jLS(h1) ???? LS(h2)j  4Mkh1 ???? h2k1 :

(b) Assume that H can be covered by k subsets B1; : : : ;Bk, that is H = B1 [
: : : [ Bk. Then, show that, for any  > 0, the following upper bound holds:
P SDm h sup h2HjLS(h)j  
i 
Xk i=1 P SDm h sup h2Bi jLS(h)j  
i :

(c) Finally, let k = N(H; 
8M ) and let B1; : : : ;Bk be balls of radius =(8M)
centered at h1; : : : ; hk covering H. Use part

(a) to show that for all i 2 [k], P SDm h sup h2Bi jLS(h)j  
i  P SDm h jLS(hi)j 

2 i ;
and apply Hoe ding's inequality (theorem D.2) to prove (3.55).

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Foundations Of Machine Learning

ISBN: 9780262351362

2nd Edition

Authors: Mehryar Mohri, Afshin Rostamizadeh

Question Posted: