7.12 Empirical margin loss boosting. As discussed in the chapter, AdaBoost can be viewed as coordinate descent

Question:

7.12 Empirical margin loss boosting. As discussed in the chapter, AdaBoost can be viewed as coordinate descent applied to a convex upper bound on the empirical error. Here, we consider an algorithm seeking to minimize the empirical margin loss. For any 0   < 1 let bR S;

(f) = 1 m

Pm i=1 1yif(xi) denote the empirical margin loss of a function f of the form f =

PT Pt=1 tht T t=1 t for a labeled sample S = ((x1; y1); : : : ; (xm; ym)).

(a) Show that bR S;

(f) can be upper bounded as follows:

bR S;

(f) 

1 m

Xm i=1 exp


????yi XT t=1 tht(xi) + 

XT t=1 t

!

:

(b) For any  > 0, let G be the objective function de ned for all  0 by G( ) =

1 m

Xm i=1 exp 0

@????yi XN j=1 jhj(xi) + 

XN j=1 j

1 A;

with hj 2 H for all j 2 [N], with the notation used in class in the boosting lecture. Show that G is convex and di erentiable.

(c) Derive a boosting-style algorithm A by applying (maximum) coordinate descent to G. You should justify in detail the derivation of the algorithm, in particular the choice of the base classi er selected at each round and that of the step. Compare both to their counterparts in AdaBoost.

(d) What is the equivalent of the weak learning assumption for A (Hint: use non-negativity of the step value)?

(e) Give the full pseudocode of the algorithm A. What can you say about the A0 algorithm?

(f) Provide a bound on bR S;(f).

i. Prove the upper bound bR S;

(f)  exp

PT t=1 t

QT t=1 Zt, where the normalization factors Zt are de ned as in the case of AdaBoost (with t the step chosen by A at round t).

ii. Give the expression of Zt as a function of  and t, where t is the weighted error of the hypothesis found by A at round t (de ned in the same way as for AdaBoost in class). Use that to prove the following upper bound bR S;

(f) 

u 1+
2 + u????1????
2 T YT t=1 q 1????
t (1 ???? t)1+;
where u = 1????
1+ .
iii. Assume that for all t 2 [T], 1????
2 ???? t >
> 0. Use the result of the previous question to show that bR S;

(f)  exp 
????
2 2T 1 ???? 2 
:
(Hint: you can use without proof the following identity:

u 1+
2 + u????1????
2 q 1????
t (1 ???? t)1+  1 ???? 2 ???? 1????
2 ???? t 2 1 ???? 2 ;
valid for 1????
2 ???? t > 0.) Show that for T  (logm)(1????2)
2 2 , all points of the training data have margin at least .

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Foundations Of Machine Learning

ISBN: 9780262351362

2nd Edition

Authors: Mehryar Mohri, Afshin Rostamizadeh

Question Posted: