Question

1 Approved Answer

Posted on Sep 24, 2024

8. (12 pts) Consider the following code segment: / learning rate // number of input features // input training data (independent variable) // training output

image text in transcribed

8. (12 pts) Consider the following code segment: / learning rate // number of input features // input training data (independent variable) // training output value (dependent variable) // model parameters // gradient of the objective function 1 mu-0.01; 3 model_input x[m] 4 model_output y 5 model 6 gradient 7iterator W[m] g[m] i [0:m] I/iterator value ranging from 0 to m-1 9 10 h sum [1] (w[i] * x[i]); // m parallel element-wise multiplications / followed by an adder tree 11 d=h-y; scalar subtraction 13 // m parallel multiplications // 15 w[i] w[1] -g[i]; m parallel subtractions This is a mathematical domain-specific language designed to easily express statistical machine learning algorithms. The above code segment is an implementation of stochastic gradient update rule for the linear regression model. Below, wewl briefly explain what the code means. You do not need any prior knowledge in machine learning to solve this problem Lines 3 to 6 declare the data types used in the gradient of the objective function. The model_input data type refers to an input training vector while the model_output refers to the corresponding output vector. The model data type refers to the model parameter for this linear regression model. Finally, the gradient type refers to the gradient of the objective function. For simplicity, you can think of these like an array of size m The iterator keyword identifies operations on n-dimensional arrays that can be performed in parallel. For example, the following expression iterator io: n; an element-by-element multiplication between the arrays A and B, both of size n, is performed. In other words, Q[0 ] = A [ 0] * B [ 0 ], Q [ 1 ] = A [ 1 ] * B [ 1 ], and so on. The sum operator (line 9) iterates over the given array to sum all the elements. In fact, lne 9 is equivalent to the following mathematical expression: a (2 pt) Draw a DFG for the above code segment. Write the operations on the nodes and registers i.e. namespace) on the edges for each DFG b) (2 pt) Suppose we want to execute the DFG you generated in the previous question on a virtual dataflow machine. How many cycles does it take to execute the DFG? Note that each operation takes one cycle c) (2 pt) For the DFG, write an instruction sequence in terms of the registers and operations you used in the previous question d) (2 pt) Convert each instruction sequence to static single assignment (SSA) form. Let us use the alpha notation (a) to name the registers, as we did in class. Assume you have infinite number of registers. e) (2 pt) Suppose we are generating instructions for a target architecture. Let us assume our target architecture has only two units that can perform arithmetic operations. In this case, how many cycles does it take to execute the DFG? Assume that each operation takes one cycle f) (2 pt) Now suppose we want to generate instructions for stack machine instead of dataflow machine. Write the instruction sequence for a stack machine such as Java Virtual Machine that represents the expression given in the problem 8. (12 pts) Consider the following code segment: / learning rate // number of input features // input training data (independent variable) // training output value (dependent variable) // model parameters // gradient of the objective function 1 mu-0.01; 3 model_input x[m] 4 model_output y 5 model 6 gradient 7iterator W[m] g[m] i [0:m] I/iterator value ranging from 0 to m-1 9 10 h sum [1] (w[i] * x[i]); // m parallel element-wise multiplications / followed by an adder tree 11 d=h-y; scalar subtraction 13 // m parallel multiplications // 15 w[i] w[1] -g[i]; m parallel subtractions This is a mathematical domain-specific language designed to easily express statistical machine learning algorithms. The above code segment is an implementation of stochastic gradient update rule for the linear regression model. Below, wewl briefly explain what the code means. You do not need any prior knowledge in machine learning to solve this problem Lines 3 to 6 declare the data types used in the gradient of the objective function. The model_input data type refers to an input training vector while the model_output refers to the corresponding output vector. The model data type refers to the model parameter for this linear regression model. Finally, the gradient type refers to the gradient of the objective function. For simplicity, you can think of these like an array of size m The iterator keyword identifies operations on n-dimensional arrays that can be performed in parallel. For example, the following expression iterator io: n; an element-by-element multiplication between the arrays A and B, both of size n, is performed. In other words, Q[0 ] = A [ 0] * B [ 0 ], Q [ 1 ] = A [ 1 ] * B [ 1 ], and so on. The sum operator (line 9) iterates over the given array to sum all the elements. In fact, lne 9 is equivalent to the following mathematical expression: a (2 pt) Draw a DFG for the above code segment. Write the operations on the nodes and registers i.e. namespace) on the edges for each DFG b) (2 pt) Suppose we want to execute the DFG you generated in the previous question on a virtual dataflow machine. How many cycles does it take to execute the DFG? Note that each operation takes one cycle c) (2 pt) For the DFG, write an instruction sequence in terms of the registers and operations you used in the previous question d) (2 pt) Convert each instruction sequence to static single assignment (SSA) form. Let us use the alpha notation (a) to name the registers, as we did in class. Assume you have infinite number of registers. e) (2 pt) Suppose we are generating instructions for a target architecture. Let us assume our target architecture has only two units that can perform arithmetic operations. In this case, how many cycles does it take to execute the DFG? Assume that each operation takes one cycle f) (2 pt) Now suppose we want to generate instructions for stack machine instead of dataflow machine. Write the instruction sequence for a stack machine such as Java Virtual Machine that represents the expression given in the