For the following problem, assume a 5-stage pipelined processor with a branch delay slot and branch resolution in the Execute stage. Also assume the pipeline
For the following problem, assume a 5-stage pipelined processor with a branch delay slot and branch resolution in the Execute stage. Also assume the pipeline has full forwarding and hardware interlocking. Consider the code below:
lw $t2, 0($t1)
label1:
beq $t2, $t0, label2 #not taken once, then taken
lw $t3, 0($t2)
beq $t3, $t0, label1 #taken
add $t1, $t3, $t1
label2:
sw $t1, 0($t2)
(a) Draw the pipeline execution diagram for the above code when an assume not taken branching scheme is used. Assume the code above has already been arranged to fill the branch delay slots.
(b) How many clock cycles are required to execute the code above when an assume not taken branching scheme is used?
(c) If the branch decision was moved to the Decode stage, how many clock cycles would be required? Draw the pipeline execution diagram.
(d) Redraw the pipeline execution diagram assuming a 100% correct branch predictor.
(e) What speedup does the branch predictor provide over the assume not taken scheme?
(f) Assuming label1 is at address 0x20000010, provide the machine code for both branches from the assembly code above.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started