Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

2. Which properties of Lasso path generalize to other loss functions? Recall we showed the optimality conditions for a Lasso solution: where as we

image text in transcribed

2. Which properties of Lasso path generalize to other loss functions? Recall we showed the optimality conditions for a Lasso solution: where as we noted in class, k B(X)=0 = = X(Y - XB(A)) = sgn(B(A)) B(A) k = 0 |X (Y XB(A))| < 2 < NE (1) 2 Vk |X (Y XB(A))| (2) (3) 2' X(YXB(A)) ARSS(B) |B=B(X) is the derivative of the loss function. We noted in class the following properties of the set of solutions {B(A) : 0 < }: i All the variables in the solution are "highly correlated" with the current residual from (1) above, and all the variables with zero coefficients are less correlated" with the current residual from (23) above. ii The solution path {(A) : 0 x 0} as a function of A can be described by a collection of "breakpoints" > 1 > 2 > ... > K >0 such that the set Ak of active variables with non-zero coefficients is fixed for all solutions B(A) with Ak k+1. iii B(A) is a piecewise linear function, in other words, for in this range we have: B(A) = (Ak) + Uk(Ak ), for a vector vk we explicitly derived in class. Assume now that we want to build a different type of model with a different convex and infinitely differentiable loss function, say a logistic regression model for a binary classification task, and add lasso penalty to that: B(X) n = arg min log {1+ exp{yx{{B}} + \||B||1 i=1 We would like to investigate which of the properties above still holds for the solution of this problem. (a) Using simple arguments about derivatives and sub-derivatives as we used in class for the quadratic loss case, argue that that three conditions like (1)-(3) can be written for this case too, with the appropriate derivative replacing the empirical correlation. Derive these expressions explicitly for the logistic case. (b) Explain clearly why this implies that properties (i), (ii) still hold (for (ii), you may find the continuity of the derivative useful). (c) Does the piecewise linearity still hold? A clear intuitive explanation is sufficient here. Hint: Consider how we obtained the linearity for squared loss in A in class by decomposing the correlation vector XT (Y - X) = XTY XTX.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

An Introduction to the Mathematics of Financial Derivatives

Authors: Ali Hirsa, Salih N. Neftci

3rd edition

012384682X, 978-0123846822

More Books

Students also viewed these Mathematics questions

Question

What are the three types of approaches to information?

Answered: 1 week ago

Question

Provide a diagram for performances.

Answered: 1 week ago