Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Ex 5 . 4 : Activation and weight scaling. Consider the two hidden unit network shown in Figure 5 . 6 2 , which uses

Ex 5.4: Activation and weight scaling. Consider the two hidden unit network shown in
Figure 5.62, which uses ReLU activation functions and has no additive bias parameters. Your
task is to find a set of weights that will fit the function
y=|x1+1.1x2|.
Can you guess a set of weights that will fit this function?
Starting with the weights shown in column b, compute the activations for the hid-
den and final units as well as the regression loss for the nine input values (x1,x2)in
{-1,0,1}{-1,0,1}.
Now compute the gradients of the squared loss with respect to all six weights using the
backpropagation chain rule equations (5.65-5.68) and sum them up across the training
samples to get a final gradient.
What step size should you take in the gradient direction, and what would your update
squared loss become?
Repeat this exercise for the initial weights in column (c) of Figure 5.62.
Given this new set of weights, how much worse is your error decrease, and how many
iterations would you expect it to take to achieve a reasonable solution?
Figure 5.63 Function optimization: (a) the contour plot of f(x,y)=x2+20y2 with
the function being minimized at (0,0); (b) ideal gradient descent optimization that quicklyFigure 5.62 Simple two hidden unit network with a ReLU activation function and no bias
parameters for regressing the function y=|x1+1.1x2| : (a) can you guess a set of weights
that would fit this function?; (b) a reasonable set of starting weights; (c) a poorly scaled set
of weights.
Lipton et al.(2021) contain myriad graded exercises with code samples to develop your
understanding of deep neural networks. If you have the time, try to work through most of
these.
Ex 5.4: Activation and weight scaling. Consider the two hidden unit network shown in
Figure 5.62, which uses ReLU activation functions and has no additive bias parameters. Your
task is to find a set of weights that will fit the function
y=|x1+1.1x2|.
Can you guess a set of weights that will fit this function?
Starting with the weights shown in column b, compute the activations for the hid-
den and final units as well as the regression loss for the nine input values (x1,x2)in
{-1,0,1}{-1,0,1}.
Now compute the gradients of the squared loss with respect to all six weights using the
backpropagation chain rule equations (5.65-5.68) and sum them up across the training
samples to get a final gradient.
What step size should you take in the gradient direction, and what would your update
squared loss become?
Repeat this exercise for the initial weights in column (c) of Figure 5.62.
Given this new set of weights, how much worse is your error decrease, and how many
iterations would you expect it to take to achieve a reasonable solution?
converges towards the minimum at x=0,y=0.
Would batch normalization help in this case?
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Programming The Perl DBI Database Programming With Perl

Authors: Tim Bunce, Alligator Descartes

1st Edition

1565926994, 978-1565926998

More Books

Students also viewed these Databases questions

Question

What does Processing of an OLAP Cube accomplish?

Answered: 1 week ago