3.30 VC-dimension generalization bound { realizable case. In this exercise we show that the bound given in

Question:

3.30 VC-dimension generalization bound { realizable case. In this exercise we show that the bound given in corollary 3.19 can be improved to O( d log(m=d)

m ) in the realizable setting. Assume we are in the realizable scenario, i.e. the target concept is included in our hypothesis class H. We will show that if a hypothesis h is consistent with a sample S  Dm then for any  > 0 such that m  8 P[R(h) > ]  2 h2em d

id 2????m=2 : (3.54)

(a) Let HS  H be the subset of hypotheses consistent with the sample S, let bR S(h) denote the empirical error with respect to the sample S and de ne S0 as another independent sample drawn from Dm. Show that the following inequality holds for any h0 2 HS:

P h

sup h2HS jbR S(h) ???? bR S0 (h)j >



2 i

 P h

B(m; ) >

m

2 i

P[R(h0) > ] ;

where B(m; ) is a binomial random variable with parameters (m; ). (Hint:

prove and use the fact that P[bR S(h)  

2 ]  P[bR S(h) > 

2 ^ R(h) > ].)

(b) Prove that P h B(m; ) > m
2 i  1 2 . Use this inequality along with the result from

(a) to show that for any h0 2 HS P h R(h0) > 
i  2 P h sup h2HS jbR S(h) ???? bR S0 (h)j >

2 i :

(c) Instead of drawing two samples, we can draw one sample T of size 2m then uniformly at random split it into S and S0. The right hand side of part (b)
can then be rewritten as:
P h sup h2HS jbR S(h)????bR S0 (h)j >

2 i = P TD2m:
T![S;S0]
h 9h2H: bR S(h) = 0 ^ bR S0 (h) >

2 i :
Let h0 be a hypothesis such that bR T (h0) > 
2 and let l > m
2 be the total number of errors h0 makes on T. Show that the probability of all l errors falling into S0 is upper bounded by 2????l.

(d) Part

(b) implies that for any h 2 H P TD2m:
T!(S;S0)
h bR S(h) = 0 ^ bR S0 (h) >

2 bR T (h0) >

2 i  2????l :
Use this bound to show that for any h 2 H P TD2m:
T!(S;S0)
h bR S(h) = 0 ^ bR S0 (h) >

2 i  2????m 2 :

(e) Complete the proof of inequality (3.54) by using the union bound to upper bound P TD2m:
T!(S;S0)
h 9h 2 H: bR S(h) = 0 ^ bR S0 (h) > 
2 i . Show that we can achieve a high probability generalization bound that is of the order O( d log(m=d)
m ).

Step by Step Answer:

Related Book For  book-img-for-question

Foundations Of Machine Learning

ISBN: 9780262351362

2nd Edition

Authors: Mehryar Mohri, Afshin Rostamizadeh

Question Posted: