In this exercise we will implement Adaboost Recall that Adaboost aims at minimizing the exponential loss min w X i exp yi X j wjhj ( xi ) , ( 1 ) where hj are the so called weak learners, and the combined classifier hw ( x ) X j wjhj ( x ) ( 2 ) Note that we assume yi in pm 1 in this exercise, and we simply take hj ( x ) sign ( pm xj bj ) for some bj in R Upon defining Mij yihj ( xi ) , we may simplify our problem further as min w in Rd 1 exp ( Mw ) , ( 3 ) where exp is applied component wise and 1 is the vector of all 1 s Recall that ( s ) max s , 0 is the positive part while ( s ) max s , 0 s s Algorithm 1 Adaboost Input M in Rn times d , w 0 0 d , p 0 1 n , max pass 3 0 0 Output w 1 for t 0 , 1 , 2 , , max pass do 2 pt pt ( 1 pt ) normalize 3 t ( M ) pt ( ) applied component wise 4 gamma t ( M ) pt ( ) applied component wise 5 beta t 1 2 ( ln gamma t ln t ) ln applied component wise 6 choose alpha t in Rd decided later 7 wt 1 wt alpha t geocircle beta t geocircle component wise multiplication 8 pt 1 pt geocircle exp ( M ( alpha t geocircle beta t ) ) exp applied component wise 1 ( 2 pts ) We claim that Algorithm 1 is indeed the celebrated Adaboost algorithm if the following holds alpha t is one hot ( i e , 1 at some entry and 0 everywhere else ) , namely, it indicates which weak classifier is chosen at iteration t M in pm 1 n times d , i e , if all weak classifiers are pm 1 valued With the above conditions, prove that ( a ) gamma t 1 t , and ( b ) the equivalence between Algorithm 1 and the Adaboost algorithm in class Note that our labels here are pm 1 and our w may have nothing to do with the one in class 2 ( 2 pts ) Let us derive each week learner hj Consider each feature in turn, we train d linear classifiers that each aims to minimize the weighted training error min bj in R , sj in pm 1 Xn i 1 pi yi ( sjxij bj ) 0 , ( 4 ) where the weights pi 0 and P i pi 1 Find ( with justification ) an optimal value for each bj and sj If multiple solutions exist, you can use the middle value If it helps, you may assume pi is uniform, i e , pi 1 n 3 ( 2 pts ) Parallel Adaboost Implement Algorithm 1 with the following choices alpha t 1 pre process M by dividing a constant so that for all i ( row ) , P j Run your implementation on the default dataset ( available on course website ) , and report the training loss in ( 3 ) , training error, and test error w r t the iteration t , where error ( w D ) 1 D X ( x , y ) in D yhw ( x ) 0 ( 5 ) Recall that hw ( x ) is defined in ( 2 ) while each hj is decided in Ex 2 2 In case you fail to determine hj , in Ex 2 3 and Ex 2 4 you may simply use hj ( x ) sign ( xj mj ) where mj is the median value of the j th feature in the training set Note that wt is dense ( i e , using all weak classifiers ) even after a single iteration Ans We report all 3 curves in one figure, with clear coloring and legend to indicate which curve is which 4 ( 2 pts ) Sequential Adaboost Implement Algorithm 1 with the following choice jt argmaxj t , j gamma t , j and alpha t has 1 on the jt th entry and 0 everywhere else pre process M by dividing a constant so that for all i and j , Mij 1 Run your implementation on the default dataset ( available on course website ) , and report the training loss in ( 3 ) , training error, and test error in ( 5 ) w r t the iteration t Note that wt has at most t nonzeros ( i e , weak classifiers ) after t iterations Ans We report all 3 curves in one figure, with clear coloring and legend to indicate which curve is which B Show all images Show all images Show all images done loading

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 09, 2024

In this exercise we will implement Adaboost. Recall that Adaboost aims at minimizing the exponential loss: min w X i exp yi X j wjhj

In this exercise we will implement Adaboost. Recall that Adaboost aims at minimizing the exponential loss:

min

exp

wjhj

(

)

, (1)

where hj are the so

-

called weak learners, and the combined classifier

(

)

=

wjhj

(

) . (2)

Note that we assume yi in

{\

1}

in this exercise, and we simply take hj

(

) =

sign

(\

pm xj

+

)

for some bj in R

.

Upon

defining Mij

=

yihj

(

),

we may simplify our problem further as:

min

w in Rd

1

exp

(

), (3)

where exp is applied component

-

wise and

1

is the vector of all

1

.

Recall that

(

) + =

max

{

, 0}

is the positive part while

(

) =

max

{

, 0} = |

|

+ .

Algorithm

1

: Adaboost.

Input: M in Rn

\

times d

,

0 = 0

,

0 = 1

,

max pass

= 300

Output: w

1

for t

= 0, 1, 2, . . .,

max pass do

2

/ (1

) / /

normalize

3

(

)

/ / ()

applied component

-

wise

4 \

gamma t

(

)

+

/ / () +

applied component

-

wise

5 \

beta t

1

2

(

\

gamma t

) / /

ln applied component

-

wise

6

choose

\

alpha t in Rd

/ /

decided later

7

+ 1

+ \

alpha t

\

geocircle

\

beta t

/ / \

geocircle component

-

wise multiplication

8

+ 1

\

geocircle exp

(

(\

alpha t

\

geocircle

\

beta t

)) / /

exp applied component

-

wise

1 . (2

pts

)

We claim that Algorithm

1

is indeed the celebrated Adaboost algorithm if the following holds:

\

alpha t is one

-

hot

(

.

., 1

at some entry and

0

everywhere else

),

namely, it indicates which weak classifier is

chosen at iteration t

.

M in

{\

1}

\

times d

,

.

.,

if all weak classifiers are

{\

1} -

valued.

With the above conditions, prove that

(

) \

gamma t

= 1

,

and

(

)

the equivalence between Algorithm

1

and the

Adaboost algorithm in class.

[

Note that our labels here are

{\

1}

and our w may have nothing to do with

the one in class.

]

2 . (2

pts

)

Let us derive each week learner hj

.

Consider each feature in turn, we train d linear classifiers that

each aims to minimize the weighted training error:

min

bj in R

,

sj in

{\

1}

= 1

[[

(

sjxij

+

) =0]],(4)

where the weights pi

> = 0

and P

= 1 .

Find

(

with justification

)

an optimal value for each bj and sj

. [

multiple solutions exist, you can use the middle value.

]

If it helps, you may assume pi

is uniform, i

.

.,

1

3 . (2

pts

) [

Parallel Adaboost.

]

Implement Algorithm

1

with the following choices:

\

alpha t

1

pre

-

process M by dividing a constant so that for all i

(

row

),

|

Run your implementation on the default dataset

(

available on course website

),

and report the training loss

(3),

training error, and test error w

.

.

.

the iteration t

,

where

error

(

w; D

)

= 1

|

|

(

,

)

in D

[[

yhw

(

) =0]].(5)

[

Recall that hw

(

)

is defined in

(2)

while each hj is decided in Ex

2.2 .

In case you fail to determine hj

,

in Ex

2.3

and Ex

2.4

you may simply use hj

(

) =

sign

(

)

where mj is the median value of the j

-

feature in the training set.

]

[

Note that wt is dense

(

.

.,

using all weak classifiers

)

even after a single iteration.

]

Ans: We report all

3

curves in one figure, with clear coloring and legend to indicate which curve is which.

4 . (2

pts

) [

Sequential Adaboost.

]

Implement Algorithm

1

with the following choice:

=

argmaxj

|

,

\

gamma t

,

|

and

\

alpha t has

1

on the jt

-

th entry and

0

everywhere else.

pre

-

process M by dividing a constant so that for all i and j

, |

Mij

| =1.

Run your implementation on the default dataset

(

available on course website

),

and report the training loss

(3),

training error, and test error in

(5)

.

.

.

the iteration t

.

[

Note that wt has at most t nonzeros

(

.

.,

weak classifiers

)

after t iterations.

]

Ans: We report all

3

curves in one figure, with clear coloring and legend to indicate which curve is which.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases Organizing Information Digital And Information Literacy

Authors: Greg Roza

1st Edition

★★★★★

Discuss the characteristics of appropriate stationery for letters,memos, and envelopes.

Answered: 1 week ago

Previous Question Next Question