8.9 General inequality. In this exercise we generalize the result of exercise 8.7 by using a more...

Question:

8.9 General inequality. In this exercise we generalize the result of exercise 8.7 by using a more general inequality: log(1 ???? x)  ????x ???? x2 for some 0 < < 2.

(a) First prove that the inequality is true for x 2 [0; 1 ????

2 ]. What does this imply about the valid range of ?

(b) Give a generalized version of the regret bound derived in exercise 8.7 in terms of , which shows:

RT 

logN 1 ????

+

1 ????

T :

What is the optimal choice of and the resulting bound in this case?

(c) Explain how may act as a regularization parameter. What is the optimal choice of ?

8.10 On-line to batch | non-convex loss.

The on-line to batch result of theorem 8.15 heavily relies on the fact that the loss is convex in order to provide a generalization guarantee for the uniformly averaged hypothesis 1 T

PT i=1 hi. For general losses, instead of using the averaged hypothesis we will use a di erent strategy and try to estimate the best single base hypothesis and show the expected loss of this hypothesis is bounded.

Let mi denote the cumulative loss of hypothesis hi on the points (xi; : : : ; xT ), that is mi =

PT t=i L(hi(xt); yt). Then we de ne the penalized risk estimate of hypothesis hi as, mi T ???? i + 1

+ c(T ???? i + 1) where c(x) =

r 1

2x log T(T + 1)



:

The term c penalizes the empirical error when the test sample is small. De ne bh = hi where i = argminimi=(T ???? i + 1) + c(T ???? i + 1). We will then show under the same conditions of theorem 8.15 (with M = 1 for simplicity), but without requiring the convexity of L, that the following holds with probability at least 1 ???? :

R(bh) 

1 T

XT i=1 L(hi(xi); yi) + 6 r

1 T

log 2(T + 1)



: (8.31)

(a) Prove the following inequality:

min i2[T]

(R(hi) + 2c(T ???? i + 1)) 

1 T

XT i=1 R(hi) + 4 r

1 T

log T + 1



:

(b) Use part

(a) to show that with probability at least 1 ???? , min i2[T]
(R(hi) + 2c(T ???? i + 1))
<
XT i=1 L(hi(xi); yi) +
r 2 T log 1 
+ 4 r 1 T log T + 1 
:

(c) By design, the de nition of c ensures that with probability at least 1 ???? 
R(bh)  min i2[T]
(R(hi) + 2c(T ???? i + 1)) :
Use this property to complete the proof of (8.31).

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Foundations Of Machine Learning

ISBN: 9780262351362

2nd Edition

Authors: Mehryar Mohri, Afshin Rostamizadeh

Question Posted: