Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space

image text in transcribed

Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space includes three or more classes, i.e., y = {1, 2, ...,K} for K > 3. To do so, consider a training data S = {(x1, y), (x2, y2),..., (Xn, Yn)} where the feature vectors are di Rd and Yi e , i = 1, 2, ..., n. Here we assume that we have K different classes and each input Xi is a d dimensional vector. We can extend the binary prediction rule to multiclass setting by letting the prediction function to be: f(multiclass) argmax P[y = k|x] ke{1,2,...,K} One way to generalize binary model so that we can map to label set y is to assign a different set of parameters wk, bx to each label k e y. For simplicity we ignore the intercept parameters bk. The posterior probability is given by: expW, 2) P[y=k| X; W1, ... , WK-1] = 1+B***exp(w? x) for k = 1,...,K-1 P[y = K | X; W1, ..., WK-1] 1+ki exp(WT) 1 Note that we need to estimate {W1, ..., WK-1}, where each one is a d-dimensional vector. Therefore, we need to estimate in total (K 1) xd parameters. Note that we do not consider parameter vectors for class K as it can be inferred from rest (similar to binary classification where we only had a single parameter vector). a) Please write down explicitly the log likelihood function and simplify it as much as you can. b) Compute the gradient of likelihood with respect to each wk and simply it c) Derive the stochastic gradient descent (SGD) update for multiclass logistic regression. Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space includes three or more classes, i.e., y = {1, 2, ...,K} for K > 3. To do so, consider a training data S = {(x1, y), (x2, y2),..., (Xn, Yn)} where the feature vectors are di Rd and Yi e , i = 1, 2, ..., n. Here we assume that we have K different classes and each input Xi is a d dimensional vector. We can extend the binary prediction rule to multiclass setting by letting the prediction function to be: f(multiclass) argmax P[y = k|x] ke{1,2,...,K} One way to generalize binary model so that we can map to label set y is to assign a different set of parameters wk, bx to each label k e y. For simplicity we ignore the intercept parameters bk. The posterior probability is given by: expW, 2) P[y=k| X; W1, ... , WK-1] = 1+B***exp(w? x) for k = 1,...,K-1 P[y = K | X; W1, ..., WK-1] 1+ki exp(WT) 1 Note that we need to estimate {W1, ..., WK-1}, where each one is a d-dimensional vector. Therefore, we need to estimate in total (K 1) xd parameters. Note that we do not consider parameter vectors for class K as it can be inferred from rest (similar to binary classification where we only had a single parameter vector). a) Please write down explicitly the log likelihood function and simplify it as much as you can. b) Compute the gradient of likelihood with respect to each wk and simply it c) Derive the stochastic gradient descent (SGD) update for multiclass logistic regression

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Machine Performance Modeling Methodologies And Evaluation Strategies Lncs 257

Authors: Francesca Cesarini ,Silvio Salza

1st Edition

3540179429, 978-3540179429

Students also viewed these Databases questions