Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1. Convexity of Generalized Linear Models In this question we will explore and show some nice properties of Generalized Linear Models, specically those related to

1. Convexity of Generalized Linear Models

In this question we will explore and show some nice properties of Generalized Linear Models, specically those

related to its use of Exponential Family distributions to model the output.

Most commonly, GLMs are trained by using the negative log-likelihood (NLL) as the loss function. This is mathemat-

ically equivalent to Maximum Likelihood Estimation (i.e., maximizing the log-likelihood is equivalent to minimizing

the negative log-likelihood). In this problem, our goal is to show that the NLL loss of a GLM is a convex function

w.r.t the model parameters. As a reminder, this is convenient because a convex function is one for which any local

minimum is also a global minimum, and there is extensive research on how to optimize various types of convex

functions eciently with various algorithms such as gradient descent or stochastic gradient descent.

To recap, an exponential family distribution is one whose probability density can be represented

p(y; ) = b(y) exp(T T(y) a());

where is the natural parameter of the distribution. Moreover, in a Generalized Linear Model, is modeled as

T x, where x 2 Rd are the input features of the example, and 2 Rd are learnable parameters. In order to show

that the NLL loss is convex for GLMs, we break down the process into sub-parts, and approach them one at a time.

Our approach is to show that the second derivative (i.e., Hessian) of the loss w.r.t the model parameters is Positive

Semi-Denite (PSD) at all values of the model parameters. We will also show some nice properties of Exponential

Family distributions as intermediate steps.

For the sake of convenience we restrict ourselves to the case where is a scalar. Assume p(Y jX; ) ExponentialFamily(),

where 2 R is a scalar, and T(y) = y. This makes the exponential family representation take the form

p(y; ) = b(y) exp(y a()):

(a) [6 points (Written)] Derive an expression for the mean of the distribution. Show that E[Y ; ] = @

@ a()

(note that E[Y ; ] = E[Y jX; ] since = T x). In other words, show that the mean of an exponential family

distribution is the rst derivative of the log-partition function with respect to the natural parameter.

Hint: Start with observing that @

@

R

p(y; )dy =

R @

@ p(y; )dy.

(b) [6 points (Written)] Next, derive an expression for the variance of the distribution. In particular, show

that Var(Y ; ) = @2

@2 a() (again, note that Var(Y ; ) = Var(Y jX; )). In other words, show that the variance

of an exponential family distribution is the second derivative of the log-partition function w.r.t. the natural

parameter.

Hint: Building upon the result in the previous sub-problem can simplify the derivation.

(c) [6 points (Written)] Finally, write out the loss function `(), the NLL of the distribution, as a function of

. Then, calculate the Hessian of the loss w.r.t , and show that it is always PSD. This concludes the proof

that NLL loss of GLM is convex.

Hint 1: Use the chain rule of calculus along with the results of the previous parts to simplify your derivations.

Hint 2: Recall that variance of any probability distribution is non-negative.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Matrix Theory And Applications With MATLAB

Authors: Darald J Hartfiel

1st Edition

1482285630, 9781482285635

More Books

Students also viewed these Mathematics questions

Question

5. Give examples of binary thinking.

Answered: 1 week ago