Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space
Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space includes three or more classes, i.e., y = {1, 2, ...,K} for K > 3. To do so, consider a training data S = {(x1, y), (x2, y2),..., (Xn, Yn)} where the feature vectors are di Rd and Yi e , i = 1, 2, ..., n. Here we assume that we have K different classes and each input Xi is a d dimensional vector. We can extend the binary prediction rule to multiclass setting by letting the prediction function to be: f(multiclass) argmax P[y = k|x] ke{1,2,...,K} One way to generalize binary model so that we can map to label set y is to assign a different set of parameters wk, bx to each label k e y. For simplicity we ignore the intercept parameters bk. The posterior probability is given by: expW, 2) P[y=k| X; W1, ... , WK-1] = 1+B***exp(w? x) for k = 1,...,K-1 P[y = K | X; W1, ..., WK-1] 1+ki exp(WT) 1 Note that we need to estimate {W1, ..., WK-1}, where each one is a d-dimensional vector. Therefore, we need to estimate in total (K 1) xd parameters. Note that we do not consider parameter vectors for class K as it can be inferred from rest (similar to binary classification where we only had a single parameter vector). a) Please write down explicitly the log likelihood function and simplify it as much as you can. b) Compute the gradient of likelihood with respect to each wk and simply it c) Derive the stochastic gradient descent (SGD) update for multiclass logistic regression. Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space includes three or more classes, i.e., y = {1, 2, ...,K} for K > 3. To do so, consider a training data S = {(x1, y), (x2, y2),..., (Xn, Yn)} where the feature vectors are di Rd and Yi e , i = 1, 2, ..., n. Here we assume that we have K different classes and each input Xi is a d dimensional vector. We can extend the binary prediction rule to multiclass setting by letting the prediction function to be: f(multiclass) argmax P[y = k|x] ke{1,2,...,K} One way to generalize binary model so that we can map to label set y is to assign a different set of parameters wk, bx to each label k e y. For simplicity we ignore the intercept parameters bk. The posterior probability is given by: expW, 2) P[y=k| X; W1, ... , WK-1] = 1+B***exp(w? x) for k = 1,...,K-1 P[y = K | X; W1, ..., WK-1] 1+ki exp(WT) 1 Note that we need to estimate {W1, ..., WK-1}, where each one is a d-dimensional vector. Therefore, we need to estimate in total (K 1) xd parameters. Note that we do not consider parameter vectors for class K as it can be inferred from rest (similar to binary classification where we only had a single parameter vector). a) Please write down explicitly the log likelihood function and simplify it as much as you can. b) Compute the gradient of likelihood with respect to each wk and simply it c) Derive the stochastic gradient descent (SGD) update for multiclass logistic regression
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started