Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Ex 5 . 5 : Function optimization. Consider the function f ( x , y ) = x 2 + 2 0 y 2 shown

Ex 5.5: Function optimization. Consider the function f(x,y)=x2+20y2 shown in Fig-
ure 5.63a. Begin by solving for the following:
Calculate gradf, i.e., the gradient of f.
Evaluate the gradient at x=-20,y=5.
Implement some of the common gradient descent optimizers, which should take you from
the starting point x=-20,y=5 to near the minimum at x=0,y=0. Try each of the
following optimizers:
Standard gradient descent.
Gradient descent with momentum, starting with the momentum term as =0.99.
Adam, starting with decay rates of 1=0.9 and b2=0.999.
Play around with the learning rate . For each experiment, plot how x and y change over
time, as shown in Figure 5.63b.
How do the optimizers behave differently? Is there a single learning rate that makes all
the optimizers converge towards x=0,y=0 in under 200 steps? Does each optimizer
monotonically trend towards x=0,y=0?Figure 5.63 Function optimization: (a) the contour plot of f(x,y)=x2+20y2 with
the function being minimized at (0,0);(b) ideal gradient descent optimization that quickly
converges towards the minimum at x=0,y=0.
Would batch normalization help in this case?
Note: the following exercises were suggested by Matt Deitke.
Ex 5.5: Function optimization. Consider the function f(x,y)=x2+20y2 shown in Fig-
ure 5.63a. Begin by solving for the following:
Calculate gradf, i.e., the gradient of f.
Evaluate the gradient at x=-20,y=5.
Implement some of the common gradient descent optimizers, which should take you from
the starting point x=-20,y=5 to near the minimum at x=0,y=0. Try each of the
following optimizers:
Standard gradient descent.
Gradient descent with momentum, starting with the momentum term as =0.99.
Adam, starting with decay rates of 1=0.9 and b2=0.999.
Play around with the learning rate . For each experiment, plot how x and y change over
time, as shown in Figure 5.63b.
How do the optimizers behave differently? Is there a single learning rate that makes all
the optimizers converge towards x=0,y=0 in under 200 steps? Does each optimizer
monotonically trend towards x=0,y=0?
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Real Time Database And Information Systems Research Advances

Authors: Azer Bestavros ,Victor Fay-Wolfe

1st Edition

1461377803, 978-1461377801

More Books

Students also viewed these Databases questions