Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

8. (12 pts) Consider the following code segment: / learning rate // number of input features // input training data (independent variable) // training output

image text in transcribed

8. (12 pts) Consider the following code segment: / learning rate // number of input features // input training data (independent variable) // training output value (dependent variable) // model parameters // gradient of the objective function 1 mu-0.01; 3 model_input x[m] 4 model_output y 5 model 6 gradient 7iterator W[m] g[m] i [0:m] I/iterator value ranging from 0 to m-1 9 10 h sum [1] (w[i] * x[i]); // m parallel element-wise multiplications / followed by an adder tree 11 d=h-y; scalar subtraction 13 // m parallel multiplications // 15 w[i] w[1] -g[i]; m parallel subtractions This is a mathematical domain-specific language designed to easily express statistical machine learning algorithms. The above code segment is an implementation of stochastic gradient update rule for the linear regression model. Below, wewl briefly explain what the code means. You do not need any prior knowledge in machine learning to solve this problem Lines 3 to 6 declare the data types used in the gradient of the objective function. The model_input data type refers to an input training vector while the model_output refers to the corresponding output vector. The model data type refers to the model parameter for this linear regression model. Finally, the gradient type refers to the gradient of the objective function. For simplicity, you can think of these like an array of size m The iterator keyword identifies operations on n-dimensional arrays that can be performed in parallel. For example, the following expression iterator io: n; an element-by-element multiplication between the arrays A and B, both of size n, is performed. In other words, Q[0 ] = A [ 0] * B [ 0 ], Q [ 1 ] = A [ 1 ] * B [ 1 ], and so on. The sum operator (line 9) iterates over the given array to sum all the elements. In fact, lne 9 is equivalent to the following mathematical expression: a (2 pt) Draw a DFG for the above code segment. Write the operations on the nodes and registers i.e. namespace) on the edges for each DFG b) (2 pt) Suppose we want to execute the DFG you generated in the previous question on a virtual dataflow machine. How many cycles does it take to execute the DFG? Note that each operation takes one cycle c) (2 pt) For the DFG, write an instruction sequence in terms of the registers and operations you used in the previous question d) (2 pt) Convert each instruction sequence to static single assignment (SSA) form. Let us use the alpha notation (a) to name the registers, as we did in class. Assume you have infinite number of registers. e) (2 pt) Suppose we are generating instructions for a target architecture. Let us assume our target architecture has only two units that can perform arithmetic operations. In this case, how many cycles does it take to execute the DFG? Assume that each operation takes one cycle f) (2 pt) Now suppose we want to generate instructions for stack machine instead of dataflow machine. Write the instruction sequence for a stack machine such as Java Virtual Machine that represents the expression given in the problem 8. (12 pts) Consider the following code segment: / learning rate // number of input features // input training data (independent variable) // training output value (dependent variable) // model parameters // gradient of the objective function 1 mu-0.01; 3 model_input x[m] 4 model_output y 5 model 6 gradient 7iterator W[m] g[m] i [0:m] I/iterator value ranging from 0 to m-1 9 10 h sum [1] (w[i] * x[i]); // m parallel element-wise multiplications / followed by an adder tree 11 d=h-y; scalar subtraction 13 // m parallel multiplications // 15 w[i] w[1] -g[i]; m parallel subtractions This is a mathematical domain-specific language designed to easily express statistical machine learning algorithms. The above code segment is an implementation of stochastic gradient update rule for the linear regression model. Below, wewl briefly explain what the code means. You do not need any prior knowledge in machine learning to solve this problem Lines 3 to 6 declare the data types used in the gradient of the objective function. The model_input data type refers to an input training vector while the model_output refers to the corresponding output vector. The model data type refers to the model parameter for this linear regression model. Finally, the gradient type refers to the gradient of the objective function. For simplicity, you can think of these like an array of size m The iterator keyword identifies operations on n-dimensional arrays that can be performed in parallel. For example, the following expression iterator io: n; an element-by-element multiplication between the arrays A and B, both of size n, is performed. In other words, Q[0 ] = A [ 0] * B [ 0 ], Q [ 1 ] = A [ 1 ] * B [ 1 ], and so on. The sum operator (line 9) iterates over the given array to sum all the elements. In fact, lne 9 is equivalent to the following mathematical expression: a (2 pt) Draw a DFG for the above code segment. Write the operations on the nodes and registers i.e. namespace) on the edges for each DFG b) (2 pt) Suppose we want to execute the DFG you generated in the previous question on a virtual dataflow machine. How many cycles does it take to execute the DFG? Note that each operation takes one cycle c) (2 pt) For the DFG, write an instruction sequence in terms of the registers and operations you used in the previous question d) (2 pt) Convert each instruction sequence to static single assignment (SSA) form. Let us use the alpha notation (a) to name the registers, as we did in class. Assume you have infinite number of registers. e) (2 pt) Suppose we are generating instructions for a target architecture. Let us assume our target architecture has only two units that can perform arithmetic operations. In this case, how many cycles does it take to execute the DFG? Assume that each operation takes one cycle f) (2 pt) Now suppose we want to generate instructions for stack machine instead of dataflow machine. Write the instruction sequence for a stack machine such as Java Virtual Machine that represents the expression given in the

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Modern Database Management

Authors: Jeff Hoffer, Ramesh Venkataraman, Heikki Topi

12th edition

133544613, 978-0133544619

More Books

Students also viewed these Databases questions