Question
a. (10) Which operation(s) in the loop can NOT be parallelized? Hint: these will be the operation(s) that depend on the result of that operation
a. (10) Which operation(s) in the loop can NOT be parallelized? Hint: these will be the operation(s) that depend on the result of that operation from the previous loop iteration. Write your answers in your solutions document.
b. (10) Given your answer from part a, what is the best-case CPE for the loop as currently written? Assume that float addition has a latency of 3 cycles, float multiplication has a latency of 5 cycles, and all integer operations have a latency of 1 cycle. Hint: the best-case CPE will be latency of the slowest of the operation(s) you identified in part a. Write your answers in your solutions document.
2. [40] Suppose we've got a procedure that computes the inner product of two arrays u and v. Consider the following C code: void inner (float *u, float *v, int length, float *dest) { int i; float sum = 0.0f; for (i = 0; iStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started