Question

1 Approved Answer

Posted on Jun 11, 2024

You can use calculator for the following question. In this question, we will elaborate on the gradient descent method. The denition of gradient descent method

You can use calculator for the following question. In this question, we will elaborate on the gradient descent method. The denition of gradient descent method as well as a simple example of its usage is explained in Page 3-4. Question: 1. Consider the function _(x-1)2 f'(x,y)=1n x4+(y1)2+1 +y2e T We would like to find its global minimum. The graph off(x, y) looks like the following. From the graph, it seems like the global minimum is quite close to (0, 0). So, let' s start with (0, 0), and try to adjust the multiplier a. a. Compute the gradient Vf(x, y). b. Set a = 1. Compute the first few iterations of the gradient descent method. Does it lead you to the global minimum? If it does, how many steps do you need to guess what the global minimum is (up to the hundredths place)? If not, explain why it does not work. c. Set a = 1/4 Compute the first few iterations of the gradient descent method. Does it lead you to the global minimum? If it does, how many steps do you need to guess what the global minimum is (up to the hundredths place)? If not, explain why it does not work. d. What do you think is the best value for the multiplier a? The gradient descent method is a numerical method to find the global minimum of a func- tion f(r, y) . The method is described as follows. Step 1. You pick a point (a, b) that seems quite close to the global minimum. You also set an appropriate multiplier, which is just a number or > 0. Step 2. After each iteration, you replace (a, b) by the point (a, b) me (a - afz(a, b), b - afy(a, b)) Namely, you add -a V f(a, b) to (a, b), which means that you go towards the direction of negative gradient by the factor of a. Step 3. Repeat Step 2 multiples times while you are praying that these points will stabilize. Let us see how this is used in a simple example. Example. Let's try to find the global minimum of f (r, y) = x2 + y'. Of course, it is quite easy to see that the minimum is achieved at (0, 0), but let's suppose we don't know this. Let's suppose you thought (1, 1) is close to the global minimum. You also pick o = = as your multiplier. First iteration. You replace (1, 1) by (1-,fz(1, 1), 1-2f,(1, 1)). We know fr(x, y) = 2x and fy(x, y) = 2y, so fx(1, 1) = 2 and fy(1, 1) = 2. Thus, we replace (1, 1) by (1 - }, 1 - }) = (0,0). Second iteration. You see that fr(0, 0) = 0 and fy(0, 0) = 0. Voila! You've reached the global minimum. Obviously some luck has happened, because we miraculously chose the right multiplier of = = to exactly land at the global minimum. Suppose instead you chose o = 1, and let's see how the gradient descent goes. First iteration. You replace (1, 1) by (1 - fx(1, 1), 1 - fy(1, 1)). Since fz(1, 1) = 2 and fy(1, 1) = 2, the new point is (-1, -1). Second iteration. You replace (-1, -1) by (-1 - fx(-1, -1), -1 - fy(-1, -1)). Since fx(-1, -1) = -2 and fy(-1, -1) = -2, the new point is (-1 + 2, -1 + 2) = (1, 1). Third iteration. You notice that we are now at where we started! You are trapped in an infinite loop of agony. So it could happen that, if o is too large, the gradient descent doesn't lead you anywhere. Let's now suppose you instead chose a = First iteration. You replace (1, 1) by (1 - Ifz(1, 1), 1 - If,(1, 1)). Since fx(1, 1) = 2 and fy(1, 1) = 2, the new point you get is (1 - }, 1 - }) = (}, ;). Second iteration. You replace (,}) by ( - If=(, ;), } - If,(, ;)). Since f.(, ;) = 1 and fy(z: =) = 1, the new point you get is (} - 1, ; - 1) = (1, 1). The gradient descent method works for arbitrary number of variables, but for simplicity we only focus on two- variable functions in this assignment. Third iteration. You replace (1, ]) by (1 - If.(1, 1), 1 - Ify(1, 1)). Since f.(1, 1) = ; and f.(1: 1) = = the new point you get is (1 - , 1 - ) = (; ). Fourth iteration. You replace (, ) by ( -If.(, !), 3- Ify(3, )). Since f.(3, !) = 1 and fu(: ) = 1, the new point you get is (s - 16: 8 - 16) = (16: 16). Now hopefully you see the pattern: the point gets closer and closer to (0, 0), but never reaches it. This is usually the case, and this is at least leading you to a right direction. Keep in mind that, if o is too small, the sequence may move too slowly