Question: could you help giving me explanations on this problem ? Thank you . Derive the total loss function in [184) using the square loss in

could you help giving me explanations on this problem ? Thank you .

Derive the total loss function in [184) using the square loss in {185) by using the maximum likelihood framework in (189). You may assume that the true output y E IR" is generated by a k-diemnsional multivariate Normal distribution N (x; w), 02D, where I is the x x 'K identity matrix. (See Example 3.2.2.) The standard choices are Regression (square loss): ((ys, y(xs; W)) = 7 llys -y(xs; W)IIF. (185) Classification (cross-entropy loss): ((ys, y(xs;W)) = - _ l(ys = class i) log Vi(Xs; W). (186) 1= 1 The square loss for the regression case is natural and we have seen it for polynomial regression mod- els in Section 2. On the other hand, for classification problems, recall that we usually model the predic tive probabilities given features so that y(X,; w) ER* can be viewed as a predictive PMF of the class of Xs. One can think of the 'cross-entropy' as a similarity meature between two probability distributions. Namely, if we have two PMFs p = [p1,. ..,Px] and q = [q1,...,qx], then the cross-entropy of q relative to p is defined by H(p, q) = - E p; log qi. (187) 1=1 Note that the following class-indicator vector [1 (ys = class 1), ...,1(ys = class x) ] (188) is a PMF on the x classes. Hence the cross-entropy loss in (186) is in fact the cross-entropy between the class-indicator vector above with the predictive PMF y(Xs; W) ERK. One of the recurreing theme in all of regression and classification models we studied so far is to take the loss function as the negative log likelihood. Namely, total (w) := -log L(y1, ..., yn; W). (189) In fact, the square loss and cross-entropy loss above can be derived from this maximum likelihood for- mulation. For instance, consider the case of x-class classification. Recall that y(xs; w) is the predictive PMF so that its ith coordinate, yi(xs; W), is the probability that the class of X, is i according to our model. Hence the joint likelihood is L( )1,..., YN; W) = P(Y1 = y1,..., YN = YN; W) = IIII()i(xs;w)) 1(ys=class ?) (190) s=li=1 Hence we have -log L(y1,..., IN; W) = - _ _l(ys = class i) log(Vi(Xs; W)), (191) s=li=1 which is exactly the total loss function in (184) using the cross-entropy loss in (186)6Example 3.2.2 (Multivariate Gaussian distribution). The a p-dimensional multivariate Gaussian distri- bution is a function RP - R given by 1 p(x; u, Z) = (27) P/2| |1/2 exp (192) where u E RP and Z E RPXP is a symmetric positive semi-definite matrix' and |Z| denotes the determi nant of E. If a random vector X E RP has distribution given by (192), then we denote X ~ N(u, Z). Then it Compare this computation with the one for the multiclass logistic regression in Section 3.3. "A matrix A E RPXP is symmetric if A = A' and positive semi-definite if x Ax 2 0 for all x E RP. Moreover, A is positive definite if it is positive semi-definite and furthermore, x Ax > 0 ifx /0. 2. TRAINING FEEDFORWARD NEURAL NETWORKS 45 has the following properties " = [[X], E = Cov(X) = E [(X -() (X-M) ]]. (193) For these reasons, we call u and E the mean and the covariance matrix of X, respectively

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

could you help giving me explanations on this problem ? Thank you . Question : Derive the total loss function in [184) using the square loss in {185) by using the maximum likelihood framework in...

i need 5 balance sheets for Auto Zone company from the resources "attachments and the link below" for the last 5 years, 2011-2012-2013-2014-2015 with the sixth page including the total income and...

Business Event Analysis: Complete the following table based on the specified NBD event. Receive Shipment (From Supplier) Trigger(s) (also indicate type) Collected Data (originates as input for event)...

seventh pages Chapter 3 Curve Sketching How much metal would be required to make a 400-mL soup can? What is the least amount of cardboard needed to build a box that holds 3000 cm3 of cereal? The...

Please quality check my solutions for this advanced heat transfer problem. Show all work. Thank you. All original documents are located at the bottom. 3.9 COMPACT HEAT EXCHANGERS 3.9.10-1 3.9.10...

(a) In SystemVerilog, what is the difference between: (i) The ternary operator ? and if...then...else statements? [2 marks] (ii) always_ff and always_comb? [2 marks] (iii) Blocking, non-blocking and...

1 2 3 4 7 8 9 12 13 14 15 16 17 18 19 20 21 22 23 24 28 29 30 31 38 40 41 44 47 48 49 50 51 62 63 64 66 67 68 69 70 71 73 74 76 77 78 79 80 81 82 85 86 87 88 89 90 91 92 93 94 95 99 100 101 104 105...

Hi, Please put together Case 22-5 and 18-4 Using the following format: Cover Sheet Table of Contents Facts Analysis Conclusion Please make sure to rephrase all the content into one. I am attaching...

CASE 9.2 COST OF QUALITY AND TAGUCHI LOSS FUNCTIONS. Rad nor Industries makes a variety of small one- to four-cup automatic coffeemakers that are used by hotels and in some residential settings for...

Prepare a plan for the attention of the CEO of the company outlining the issues that have been observed, how would you propose to resolve them, and develop a set of strategies to build a culture of...

1. Determine 10 V + the I in the following circuit: 21 I x 3

The duration of a 5 - year zero coupon bond is d . 5 . 3 c 5 . 1 b . 5 . 0 4 . 8

Compute the present value of a $300 cash flow for the following combinations of discount rates and times. (Do not Intermediate calculations. Round your answers to 2 decimal places.) 0.12%; 7-8 years...

List and explain two guidelines to follow when creating e-mail and one to follow when replying to e-mail. (Objective 4)

Discuss the factors to be considered when analyzing the receiver of a message to ensure your intended message is received. (Objective 2)

Explain how a message made up of short words and short sentences could have a high readability rating. (Objective 3)