Question
We want to study several instruction level parallelism techniques, we are given the following bench- mark program, assuming R 1 is initialized by 0, and
We want to study several instruction level parallelism techniques, we are given the following bench- mark program, assuming R 1 is initialized by 0, and R6, R7, R8, R9 and F10 contain constant non- zero values:
Loop: LD F12, 0(R6)
DIVD F14, F12, F10
LD F16, O(R7)
ADDD F16, F14, F16
LD F17, 0(R8)
MULTD F18, F17, F16
SD O(R9), F18
ADDI R6, R6, #4
ADDI R7, R7, #4
ADDI R8, R8, #4
ADDI R9, R9, #4
ADDI RI, RI, #1
SUBI R2, R1, #1000
BNEQZ R2, Loop Assuming a single scalar architecture, the available hardware resources & their respective latency are given below:
FU TYPE | #FUs | #EX cycles |
integer | 2 | 1 |
branch | 1 | 1 |
load | 3 | 2 |
store | 2 | 2 |
FP adder | 2 | 7 |
FP mulitplier | 1 | 5 |
FP divider | 1 | 24 |
a) Draw the hardware organization to implement dynamic scheduling with the Tomasulo algorithm. Do you expect an improved execution time compared to T1, T2 and T3? (Hint: do not perform any computations, answer from the theoretical point of view)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started