Answered step by step
Verified Expert Solution
Question
1 Approved Answer
PLEASE PROVIDE THE ANSWER FOR PART C. OTHER TWO WERE INCLUDED AS THEY ARE REFERENCED IN THE QUESTION. In the lectures, we introduced Gradient Descent,
PLEASE PROVIDE THE ANSWER FOR PART C. OTHER TWO WERE INCLUDED AS THEY ARE REFERENCED IN THE QUESTION.
In the lectures, we introduced Gradient Descent, an optimization method to find the minimum value of a function. In this problem we try to solve a fairly simple optimization problem: min f(x) = x2 XER That is, finding the minimum value of x2 over the real line. Of course you know it is when x = 0, but this time we do it with gradient descent. Recall that to perform gradient descent, you start at an arbitrary initial point x0, and you update It+1 = Xt 1Vxf(xt), where I is the learning rate. Hopefully, after T iterations, XT will be close to the minimum point. (a) (10 pts) Assume xo = 1 and we choose the learning rate to be l = 1. Now suppose a sequence, X1, ..., XT, is obtained through gradient descent algorithm. Prove that for arbitrary T > 0, f(xT) = 1. Hence, the gradient descent fails completely. Can you provide an intuitive explanation as to why? (b) (10 pts] Assume xo = 1 and 1 = 2. Prove that Xt+1 > xt is always true. The gradient descent even increases the function value! (c) (10 pts] What is the reason gradient descent fails to work in the above two cases, even for a simple optimization problem? What can be done to make gradient descent work? (You don't need a perfect solution here. In fact, a lot of research, even today, have been put into improving the stability and efficiency of (stochastic) gradient descent algorithms.)Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started