Question
we will use the following loop: for (k=n; k>=0; k--) x[k] = y[k+1] 2.0 * y[k]; If we assume: R1 contains the address of the
we will use the following loop: for (k=n; k>=0; k--) x[k] = y[k+1] 2.0 * y[k];
If we assume:
R1 contains the address of the nth element of y
R2 contains the address of the nth element of x
F0 contains 2.0
The above code could be written as the following MIPS assembly code:
Loop: L.D F2, 8(R1)
L.D F4, 0(R1)
MULT.D F6, F4, F0
SUB.D F8, F2, F6
S.D F8, 0(R2)
SUBI R1, R1, #8 #same as DADDUI R1, R1, #-8
SUBI R2, R2, #8
BNEZ R2, Loop
Using the following table for instruction latencies:
Instruction/Operation Type | Latency in Clock Cycles |
Double Load | 1 |
Double Store | 0 |
Integer Op | 0 |
FP Multiply | 4 |
FP Add/Subtract | 2 |
(1) Show the (cycle) schedule, including stalls of the unmodified loop on a fully pipelined machine. Cycle Instruction/stall
(2) Unroll the loop 3 times (for a total of 3 iterations) but do not reschedule the instructions. Ignore the branch delay slot. Do not delete any instructions other than loop overhead instructions. Assume n is a multiple of 3. Cycle Instruction/stall
(3) Unroll the loop 3 times (for a total of 3 iterations) and reschedule the instructions to reduce the number of stalls. Ignore the branch delay slot. Do not delete any instructions other than loop overhead instructions. Assume n is a multiple of 3. Cycle Instruction/stall
(4) What is the speedup of the unrolled loop in (2) from the unmodified case in (1)? What is the speedup of the unrolled and scheduled loop in (3) from the unmodified case in (1). Please show your calculation.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started