Question

1 Approved Answer

Posted on Jul 10, 2024

Problem 1 (Stochastic gradient descent). Lets go back to finding a function that approximates a whole data set using the least-squares method. In particular, were

image text in transcribed

Problem 1 (Stochastic gradient descent). Lets go back to finding a function that approximates a whole data set using the least-squares method. In particular, were going to try and find a linear function at + b that approximates data points (ti , yi) N i=1.

(a) To this end, we need a data set. We could go out to the internet for one, but for the purpose of testing algorithms, we can also just generate data. First, randomly choose N = 1000 points ti in the interval [0, 100]. For each of these ti , generate a yi by drawing from the normal disttribution N(50+ 1 2 ti , 10) in other words, one could think of the exact values to lie on the line y = 1 2 t+ 50, but actual measurement points are normally distributed around this line with a standard deviation of 10. Show a plot of all 1000 of these data points.

(b) Next, we need to find the approximating line to this data set. The least squares approach would require us to find x = (a, b) so that f(a, b) = 1 N X N i=1 (yi (ati + b))2 is minimized. Use your method of choice to find this optimum x = (a , b ) and show a plot of both the original data and the approximating linear function. (Because the objective fuction is quadratic, your method of choice should in fact be Newtons method, because then you are done in exactly one step.)

(c) Let us use the Steepest Descent method for this problem and track how fast it converges.1 To this end, implement a method in which you compute xk+1 = (ak+1, bk+1) using the iteration xk+1 = xk + kpk where pk is computed as the steepest descent direction using all data points: pk = " 1 N X N i=1 (yi (ati + b))2 # . Start at x0 = (a0, b0) = (0, 75) (i.e., corresponding to a horizontal with vertical offset equal to 75). Choose k = 5 (k+1)kpkk for the step length.2 Since you know x from the previous problem, track kx xkk and show it as a function of k in the form of a graph.

(d) Lets also try the Stochastic Gradient Descent method. To this end, implement a method in which you compute xk+1 = (ak+1, bk+1) using the iteration xk+1 = xk + kpk

where pk is computed as the steepest descent direction using only a subset Sk of all data points: pk = " 1 M X iSk (yi (ati + b))2 # . Here, Sk is a randomly chosen subset of {1, . . . , N} in each iteration k with M = 10 elements. In other words, pk only uses 1% of all of the data in each step. Using the same step length strategy as above, generate again the sequence xk that results from the method. As before, plot kx xkk as a function of k. Evaluate whether this method is competitive with the original steepest descent method, and state how you evaluate competitive.

Problem 1 (Stochastic gradient descent). Let's go back to finding a function that approximates a whole data set using the least-squares method. In particular, we're going to try and find a linear function at + b that approximates data points (ti, Yi) 21: (a) To this end, we need a data set. We could go out to the internet for one, but for the purpose of testing algorithms, we can also just generate data. First, randomly choose N = 1000 points t in the interval [0, 100). For each of these ti, generate a yi by drawing from the normal disttribution N(50+ti, 10) in other words, one could think of the exact values to lie on the line y = {t+50, but actual measurement points are normally distributed around this line with a standard deviation of 10. Show a plot of all 1000 of these data points. (b) Next, we need to find the approximating line to this data set. The least squares approach would require us to find x = (a, b) so that N 2 f(a,b) (Yi (ati + b)) i=1 is minimized. Use your method of choice to find this optimum x* = (a*,b*) and show a plot of both the original data and the approximating linear function. (Because the objective fuction is quadratic, your method of choice should in fact be Newton's method, because then you are done in exactly one step.) (c) Let us use the Steepest Descent method for this problem and track how fast it converges. To this end, implement a method in which you compute Xk+1 = (ak+1, bk+1) using the iteration Xk+1 = xk + AkPk where Pk is computed as the steepest descent direction using all data points: N 1 Pk = -V (Yi (at + b))2 *s + b] i=1 = 5 Start at Xo = (ao, bo) (0,75) (i.e., corresponding to a horizontal with vertical offset equal to 75). Choose ak (k+1)||pk|| for the step length.2 Since you know x* from the previous problem, track || 2* xk|| and show it as a function of k in the form of a graph. (d) Let's also try the Stochastic Gradient Descent method. To this end, implement a method in which you compute Xk+1 = (ak+1,bk+1) using the iteration Xk+1 = xk + akPk where pk is computed as the steepest descent direction using only a subset Sk of all data points: Pk = (Yi (at; +b)) Here, Sk is a randomly chosen subset of {1, ..., N} in each iteration k with M = 10 elements. In other words, Pk only uses 1% of all of the data in each step. Using the same step length strategy as above, generate again the sequence xk that results from the method. As before, plot ||** xk|| as a function of k. Evaluate whether this method is competitive with the original steepest descent method, and state how you evaluate competitive. Problem 1 (Stochastic gradient descent). Let's go back to finding a function that approximates a whole data set using the least-squares method. In particular, we're going to try and find a linear function at + b that approximates data points (ti, Yi) 21: (a) To this end, we need a data set. We could go out to the internet for one, but for the purpose of testing algorithms, we can also just generate data. First, randomly choose N = 1000 points t in the interval [0, 100). For each of these ti, generate a yi by drawing from the normal disttribution N(50+ti, 10) in other words, one could think of the exact values to lie on the line y = {t+50, but actual measurement points are normally distributed around this line with a standard deviation of 10. Show a plot of all 1000 of these data points. (b) Next, we need to find the approximating line to this data set. The least squares approach would require us to find x = (a, b) so that N 2 f(a,b) (Yi (ati + b)) i=1 is minimized. Use your method of choice to find this optimum x* = (a*,b*) and show a plot of both the original data and the approximating linear function. (Because the objective fuction is quadratic, your method of choice should in fact be Newton's method, because then you are done in exactly one step.) (c) Let us use the Steepest Descent method for this problem and track how fast it converges. To this end, implement a method in which you compute Xk+1 = (ak+1, bk+1) using the iteration Xk+1 = xk + AkPk where Pk is computed as the steepest descent direction using all data points: N 1 Pk = -V (Yi (at + b))2 *s + b] i=1 = 5 Start at Xo = (ao, bo) (0,75) (i.e., corresponding to a horizontal with vertical offset equal to 75). Choose ak (k+1)||pk|| for the step length.2 Since you know x* from the previous problem, track || 2* xk|| and show it as a function of k in the form of a graph. (d) Let's also try the Stochastic Gradient Descent method. To this end, implement a method in which you compute Xk+1 = (ak+1,bk+1) using the iteration Xk+1 = xk + akPk where pk is computed as the steepest descent direction using only a subset Sk of all data points: Pk = (Yi (at; +b)) Here, Sk is a randomly chosen subset of {1, ..., N} in each iteration k with M = 10 elements. In other words, Pk only uses 1% of all of the data in each step. Using the same step length strategy as above, generate again the sequence xk that results from the method. As before, plot ||** xk|| as a function of k. Evaluate whether this method is competitive with the original steepest descent method, and state how you evaluate competitive