3.30 VC-dimension generalization bound { realizable case. In this exercise we show that the bound given in
Question:
3.30 VC-dimension generalization bound { realizable case. In this exercise we show that the bound given in corollary 3.19 can be improved to O( d log(m=d)
m ) in the realizable setting. Assume we are in the realizable scenario, i.e. the target concept is included in our hypothesis class H. We will show that if a hypothesis h is consistent with a sample S Dm then for any > 0 such that m 8 P[R(h) > ] 2 h2em d
id 2????m=2 : (3.54)
(a) Let HS H be the subset of hypotheses consistent with the sample S, let bR S(h) denote the empirical error with respect to the sample S and dene S0 as another independent sample drawn from Dm. Show that the following inequality holds for any h0 2 HS:
P h
sup h2HS jbR S(h) ???? bR S0 (h)j >
2 i
P h
B(m; ) >
m
2 i
P[R(h0) > ] ;
where B(m; ) is a binomial random variable with parameters (m; ). (Hint:
prove and use the fact that P[bR S(h)
2 ] P[bR S(h) >
2 ^ R(h) > ].)
(b) Prove that P h B(m; ) > m
2 i 1 2 . Use this inequality along with the result from
(a) to show that for any h0 2 HS P h R(h0) >
i 2 P h sup h2HS jbR S(h) ???? bR S0 (h)j >
2 i :
(c) Instead of drawing two samples, we can draw one sample T of size 2m then uniformly at random split it into S and S0. The right hand side of part (b)
can then be rewritten as:
P h sup h2HS jbR S(h)????bR S0 (h)j >
2 i = P TD2m:
T![S;S0]
h 9h2H: bR S(h) = 0 ^ bR S0 (h) >
2 i :
Let h0 be a hypothesis such that bR T (h0) >
2 and let l > m
2 be the total number of errors h0 makes on T. Show that the probability of all l errors falling into S0 is upper bounded by 2????l.
(d) Part
(b) implies that for any h 2 H P TD2m:
T!(S;S0)
h bR S(h) = 0 ^ bR S0 (h) >
2 bR T (h0) >
2 i 2????l :
Use this bound to show that for any h 2 H P TD2m:
T!(S;S0)
h bR S(h) = 0 ^ bR S0 (h) >
2 i 2????m 2 :
(e) Complete the proof of inequality (3.54) by using the union bound to upper bound P TD2m:
T!(S;S0)
h 9h 2 H: bR S(h) = 0 ^ bR S0 (h) >
2 i . Show that we can achieve a high probability generalization bound that is of the order O( d log(m=d)
m ).
Step by Step Answer:
Foundations Of Machine Learning
ISBN: 9780262351362
2nd Edition
Authors: Mehryar Mohri, Afshin Rostamizadeh