Answered step by step
Verified Expert Solution
Question
1 Approved Answer
In this exercise we will implement Adaboost. Recall that Adaboost aims at minimizing the exponential loss: min w X i exp yi X j wjhj
In this exercise we will implement Adaboost. Recall that Adaboost aims at minimizing the exponential loss:
min
w
X
i
exp
yi
X
j
wjhj xi
where hj are the socalled weak learners, and the combined classifier
hwx : X
j
wjhj x
Note that we assume yi in pm in this exercise, and we simply take hj x signpm xj bj for some bj in R Upon
defining Mij yihj xi we may simplify our problem further as:
min
w in Rd
expMw
where exp is applied componentwise and is the vector of all s
Recall that s maxs is the positive part while s maxss s
Algorithm : Adaboost.
Input: M in Rntimes d
wd pn max pass
Output: w
for t max pass do
pt pt
pt normalize
t M
pt applied componentwise
gamma t M
pt applied componentwise
beta t
ln gamma t ln t ln applied componentwise
choose alpha t in Rd decided later
wt wt alpha t geocircle beta t geocircle componentwise multiplication
pt pt geocircle expMalpha t geocircle beta t exp applied componentwise
pts We claim that Algorithm is indeed the celebrated Adaboost algorithm if the following holds:
alpha t is onehot ie at some entry and everywhere else namely, it indicates which weak classifier is
chosen at iteration t
M in pm
ntimes d
ie if all weak classifiers are pm valued.
With the above conditions, prove that agamma t t and b the equivalence between Algorithm and the
Adaboost algorithm in class. Note that our labels here are pm and our w may have nothing to do with
the one in class.
pts Let us derive each week learner hj Consider each feature in turn, we train d linear classifiers that
each aims to minimize the weighted training error:
min
bj in Rsj in pm
Xn
i
pi
yisjxij bj
where the weights pi and P
i
pi Find with justification an optimal value for each bj and sj If
multiple solutions exist, you can use the middle value. If it helps, you may assume pi
is uniform, ie pi
n
ptsParallel Adaboost. Implement Algorithm with the following choices:
alpha t
preprocess M by dividing a constant so that for all i row P
j
Run your implementation on the default dataset available on course website and report the training loss
in training error, and test error wrt the iteration t where
errorw; D :
D
X
xy in D
yhwx
Recall that hwx is defined in while each hj is decided in Ex In case you fail to determine hj
in Ex and Ex you may simply use hj x signxj mj where mj is the median value of the jth
feature in the training set.
Note that wt is dense ie using all weak classifiers even after a single iteration.
Ans: We report all curves in one figure, with clear coloring and legend to indicate which curve is which.
ptsSequential Adaboost. Implement Algorithm with the following choice:
jt argmaxj
tj
gamma tj and alpha t has on the jtth entry and everywhere else.
preprocess M by dividing a constant so that for all i and jMij
Run your implementation on the default dataset available on course website and report the training loss
in training error, and test error in wrt the iteration t
Note that wt has at most t nonzeros ie weak classifiers after t iterations.
Ans: We report all curves in one figure, with clear coloring and legend to indicate which curve is which.
B
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started