Question
Consider the following code and answer the following questions. Note that F2 register holds a scalar constant that cannot be changed for the computation (see
Consider the following code and answer the following questions. Note that F2 register holds a scalar constant that cannot be changed for the computation (see MUL.D instruction)
.data
.text
main:
DADDI R3,R0,8
DADDI R1,R0,1024
DADDI R2,R0,1024
Loop: L.D F0,0(R1)
MUL.D F0,F0,F2
L.D F4,0(R2)
ADD.D F0,F0,F4
S.D F0,0(R2)
DSUB R1,R1,R3
DSUB R2,R2,R3
BNEZ R1,Loop
HALT
(a) Enable forwarding (check under the Configure tab). Run the code. How many stalls do you see? Can you identify where these stalls occur (the pair of instructions) that cause this stall. Hint: Run in Single Cycle mode using F7. What is the CPI?
(b) Execute the code by enabling Enable Branch Target Buffer (check under the Configure tab). How many stalls do you see? How many stalls do you see and what exactly does the Enable Branch Target Buffer do? What is the CPI and what is the speedup when compared to (a)?
(c) Execute the code by enabling Enable Delay Slot (check under the Configure tab). You will need to put one instruction to be executed, else HALT instruction will stop the code from executing. What is the CPI and the speedup compared to (a). Which scheme is better, branch target buffer or delay slot?
(d) Re-arrange the loop without unrolling. You can move individual instructions, however the output of this dummy loop should be exactly the same i.e. adjust the offset for memory instructions (load/store). Can you reduce the stalls for this code? What is the new CPI and the speedup when compared to (a)?
(e) Now, transform the loop by unrolling the loop, reschedule the instructions, enable delay slot or branch target buffer to completely minimize the stalls. What is the CPI and what is the speedup when compared to (a)?
Consider the following code and answer the following questions. Note that F2 register holds a scalar constant that cannot be changed for the computation (see MUL.D instruction) data text main DADD R3,R0, 8 DADD R1. RO, 1024 DADDI R2,R0, 1024 Loop: L. D 0 0 R1 MUL.D F0, F2 L.D 4, 0 (R2 ADD.D S.D 0 0 (R2 DSUB R1 R1, R3 DSUB R2 R2,R3 BNEZ R1. Loo HALT (a) Enable forwarding (check under the Configure tab). Run the code. How many stalls do you see? Can you identify where these stalls occur (the pair of instructions that cause this stall. Hint: Run in Single Cycle mode using F7. What is the CPI? (b) Execute the code by enabling Enable Branch Target Buffer (check under the Configure tab). How many stalls d you see? How many stalls d you see and what exactly does the Enable Branch Target Buffer do? What is the CPI and what is the speedup when compared to (a)? Execute the code by enabling Enable Delay Slot (check under the Configure tab). (c) need to put one instruction to be executed, else HALT instruction will stop the code from executing. What is the CPI and the speedup compared to (a). Which scheme is better, branch target buffer or delay slot? (d) Re-arrange the loop without unrolling. You can move individual instructions, however the output of this dummy loop should be exactly the same i.e. adjust the offset for memory instructions (load/store). Can you reduce the stalls for this code? What is the new CPI and the speedup when compared to (a)? (e) Now, transform the loop by unrolling the loop, reschedule the instructions, enable delay slot or branch target buffer to completely minimize the stalls. What is the CPI and what is the speedup when compared to (a)Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started