Question: Loss functions. Consider the following two loss functions, including (1) mean-squared error Loss ( T , O ) = 1 2 ( T O

Loss functions. Consider the following two loss functions, including (1) mean-squared error Loss(T,O)=12(TO)2, and (2) cross-entropy Loss(T,O)=TlogO(1T)log(1O) for binary classification. Assume the activation function is sigmoid.

a. Show the derivation of the error δ for the output unit in backpropagation process and compare the two loss functions (e.g., potential problems they might produce).

b. Now, we wish to generalize the cross-entropy loss to the scenario of multiclass classification. The target output is a one-hot vector of length C (i.e., the number of total classes), and the index of nonzero element (i.e., 1) represents the class label. The output is also a vector of the same length O=[O1,O2,,OC]. Show the derivation of the categorical cross-entropy loss and the error δ of the output unit. (hint: there are two key steps, including (1) normalizing the output values by scaling between 0 and 1 , and (2) deriving the cross entropy loss following the definition for the binary case where the loss can be represented as Loss(T,O)=i=12Tilog(Oi).)

Step by Step Solution

3.38 Rating (164 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

a For mean squared loss the error d for the output unit c... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Data Mining Concepts And Techniques Questions!