8.9 General inequality. In this exercise we generalize the result of exercise 8.7 by using a more...
Question:
8.9 General inequality. In this exercise we generalize the result of exercise 8.7 by using a more general inequality: log(1 ???? x) ????x ???? x2 for some 0 < < 2.
(a) First prove that the inequality is true for x 2 [0; 1 ????
2 ]. What does this imply about the valid range of ?
(b) Give a generalized version of the regret bound derived in exercise 8.7 in terms of , which shows:
RT
logN 1 ????
+
1 ????
T :
What is the optimal choice of and the resulting bound in this case?
(c) Explain how may act as a regularization parameter. What is the optimal choice of ?
8.10 On-line to batch | non-convex loss.
The on-line to batch result of theorem 8.15 heavily relies on the fact that the loss is convex in order to provide a generalization guarantee for the uniformly averaged hypothesis 1 T
PT i=1 hi. For general losses, instead of using the averaged hypothesis we will use a dierent strategy and try to estimate the best single base hypothesis and show the expected loss of this hypothesis is bounded.
Let mi denote the cumulative loss of hypothesis hi on the points (xi; : : : ; xT ), that is mi =
PT t=i L(hi(xt); yt). Then we dene the penalized risk estimate of hypothesis hi as, mi T ???? i + 1
+ c(T ???? i + 1) where c(x) =
r 1
2x log T(T + 1)
:
The term c penalizes the empirical error when the test sample is small. Dene bh = hi where i = argminimi=(T ???? i + 1) + c(T ???? i + 1). We will then show under the same conditions of theorem 8.15 (with M = 1 for simplicity), but without requiring the convexity of L, that the following holds with probability at least 1 ???? :
R(bh)
1 T
XT i=1 L(hi(xi); yi) + 6 r
1 T
log 2(T + 1)
: (8.31)
(a) Prove the following inequality:
min i2[T]
(R(hi) + 2c(T ???? i + 1))
1 T
XT i=1 R(hi) + 4 r
1 T
log T + 1
:
(b) Use part
(a) to show that with probability at least 1 ???? , min i2[T]
(R(hi) + 2c(T ???? i + 1))
<
XT i=1 L(hi(xi); yi) +
r 2 T log 1
+ 4 r 1 T log T + 1
:
(c) By design, the denition of c ensures that with probability at least 1 ????
R(bh) min i2[T]
(R(hi) + 2c(T ???? i + 1)) :
Use this property to complete the proof of (8.31).
Step by Step Answer:
Foundations Of Machine Learning
ISBN: 9780262351362
2nd Edition
Authors: Mehryar Mohri, Afshin Rostamizadeh