Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Need help answering b) and c) from the attached: Consider the problem of classifying D dimensional inputs :1: E R. Suppose we have 2 classes

Need help answering b) and c) from the attached:

image text in transcribed
Consider the problem of classifying D dimensional inputs :1: E R\". Suppose we have 2 classes and the output is denoted y E {9, 1}. If we use a binary logistic regression classier then the model is: pb{y|x) = Binomiol[y|cr{wa + 5)) {1) where o{o} = \"L, and w E RDA E IR are the parameters of the model. a) The derivative of the sigmoid function has a special form which can he useful in computing gradients. Show that if' = o(1 m o). Use this fact to derive the form of 3%. d2\" (1 )(1 2 ) = {I {I U dza b) Mathematically, 3: is always greater than zero. However, when implemented on a computer with nite precision arithmetic, it can become zero due to underoor. Underow of deriva tives can be particularly dangerous for machine learning models learned by using gradients to update parameters because overall gradients can become numerically zero, causing learning to fail in a variety of ways. For what range of values of :1 will 'z evaluate to zero numerically when computed by the expression j: = o{o}[1 o{o}} in single precision oating point aritlm'retic? What if the mathematically equivalent expression % = o{o)o[o) is used instead? What relevance, if any, does this have on how we should implement logistic regression? You should assume that the value of [T has been computed as accurately as possible in single precision oating point arithmetic. For reference, in single precision the smallest number larger than zero that can be represented is 2'125 m 1.13 x Ill-3'3. A numerical operation which results in a value less than this will be rounded either to D or 2-125, whichever is closer. Similarly, the largest number less than one that can be represented is 1 2'24 m 999999994. A numerical operation which results in a value between this value and one will be rounded to 1 or 1 2'24, whichever is closer. Hint: You will need to use the inverse of the sigmoid, flip} = 10s r57.- c} Multiclass logistic regression can also he used when there are only two classes. In that case, the model is pm {ny} = Categorical{y|3[Ax + c }} where 3(a) = Ejgla is the softmax function and A E Ram\": E R2 are the parameters of the model. Prove that this multiclass logistic regression model is equivalent to the binary logistic regression model. In particular show how, given the parameters w,b of any binary logistic regression model, you could construct parameters A, c of a multiclass logistic regres sion model which would always give the same predictions. Also, show how, given A, c, you can compute parameters w,f} which would always give the same predictions. Finally, are these transformations unique? Given the values A, c {or w, h}, is the value of w, b [or A, 1:) unique? If a direction is unique, give an argument why. If it's not unique, give at least one example of a different transformation which would be equivalent

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Differential Equations A Maple™ Supplement

Authors: Robert P Gilbert, George C Hsiao, Robert J Ronkese

2nd Edition

1000402525, 9781000402520

More Books

Students also viewed these Mathematics questions