Answered step by step
Verified Expert Solution
Question
1 Approved Answer
When counting the number of cycles for a loop, find out when (in which cycle) the first instruction in the loop is fetched and when
When counting the number of cycles for a loop, find out when (in which cycle) the first instruction in the loop is fetched and when it is fetched again (assuming the loop repeats), and then calculate the difference of cycle numbers. The difference is the number of cycles an iteration takes. For example, if the first instruction in a loop is fetched in cycle 1 and fetched again in cycle 11, each iteration takes 10 cycles.
4. (30 points) In this exercise, we will look at how a common vector loop runs on statically scheduled version of the MIPS pipeline (We will consider dynamic scheduling in later assignments). The loop is the so-called DAXPY loop and the central operation in Gaussian elimination. The loop implements the vector operation Y-a *X + Y. Here is the MIPS code for the loop foo: F2, 0 (R1) F4, F2, FO F6, 0 (R2) F6, F4, F6 F6, 0 (R2) R1, R1, 8 R2, R2, 8 R3, R3, 1 R3, R9, foo ; load X[i] ; multiply a *X[i] ; load Y[i] MULT. D ADD.D S.D DADDI DADDI DADDI BNE ; store Y[i] increment X pointer ; increment Y pointer ; increment the counter ; loop if not done Instructions with '.D' are double precision floating-point operations. R3 is the loop counter. It is incremented every iteration and the loop exits when R3 reaches R9 For this problem, use the MIPS pipeline shown in Figure C.35 (in Section C.5). The pipeline latencies are listed in Figure C.34 (the numbers are also shown on the slide "Latencies and initiation intervals for functional units") Assume results from instructions are fully forwarded to the beginning of each FUs (but not to the MEM stages). The MEM stage of load instructions completes in 1 clock cycle. The branch instruction is resolved in the EXE stage and it is not delayed. Branches are predicted not-taken a) Show a pipeline diagram for this instruction sequence, starting from cycle to the cycle where the first L.D enters the ID stagc in the sccond iteration. How many clock cycles docs each loop iteration take? b) Construct a table shows when (in which cycle) an instruction enters the execution stage, and the number of cycles the instruction has to wait in the ID stage. An example is shown below Instructions ,D MULT. D EXE StartStalls 3 5 F2, 0 (R1) F4, F2, FO F6, F4, F6 F6, (R2) R1, R1, 8 R2, R2, 8 R3, R3, 1 R3, R9, foo 1 DADDI DADDI DADDI BNE 4. (30 points) In this exercise, we will look at how a common vector loop runs on statically scheduled version of the MIPS pipeline (We will consider dynamic scheduling in later assignments). The loop is the so-called DAXPY loop and the central operation in Gaussian elimination. The loop implements the vector operation Y-a *X + Y. Here is the MIPS code for the loop foo: F2, 0 (R1) F4, F2, FO F6, 0 (R2) F6, F4, F6 F6, 0 (R2) R1, R1, 8 R2, R2, 8 R3, R3, 1 R3, R9, foo ; load X[i] ; multiply a *X[i] ; load Y[i] MULT. D ADD.D S.D DADDI DADDI DADDI BNE ; store Y[i] increment X pointer ; increment Y pointer ; increment the counter ; loop if not done Instructions with '.D' are double precision floating-point operations. R3 is the loop counter. It is incremented every iteration and the loop exits when R3 reaches R9 For this problem, use the MIPS pipeline shown in Figure C.35 (in Section C.5). The pipeline latencies are listed in Figure C.34 (the numbers are also shown on the slide "Latencies and initiation intervals for functional units") Assume results from instructions are fully forwarded to the beginning of each FUs (but not to the MEM stages). The MEM stage of load instructions completes in 1 clock cycle. The branch instruction is resolved in the EXE stage and it is not delayed. Branches are predicted not-taken a) Show a pipeline diagram for this instruction sequence, starting from cycle to the cycle where the first L.D enters the ID stagc in the sccond iteration. How many clock cycles docs each loop iteration take? b) Construct a table shows when (in which cycle) an instruction enters the execution stage, and the number of cycles the instruction has to wait in the ID stage. An example is shown below Instructions ,D MULT. D EXE StartStalls 3 5 F2, 0 (R1) F4, F2, FO F6, F4, F6 F6, (R2) R1, R1, 8 R2, R2, 8 R3, R3, 1 R3, R9, foo 1 DADDI DADDI DADDI BNEStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started