Question
Ex 3.14 DADDIU R4,R1,#800 ; R1 = upper bound for X foo: L.D F2,0(R1) ; (F2) = X(i) MUL.D F4,F2,F0 ; (F4) = a*X(i) L.D
Ex 3.14 DADDIU R4,R1,#800 ; R1 = upper bound for X foo: L.D F2,0(R1) ; (F2) = X(i) MUL.D F4,F2,F0 ; (F4) = a*X(i) L.D F6,0(R2) ; (F6) = Y(i) ADD.D F6,F4,F6 ; (F6) = a*X(i) + Y(i) S.D F6,0(R2) ; Y(i) = a*X(i) + Y(i) DADDIU R1,R1,#8 ; increment X index DADDIU R2,R2,#8 ; increment Y index DSLTU R3,R1,R4 ; test: continue loop? BNEZ R3,foo ; loop if needed
In this exercise, we will look at how variations on Tomasulos algorithm perform when running the loop from Exercise 3.14. The functional units (FUs) are described in the table below. Assume the following: Functional units are not pipelined. There is no forwarding between functional units; results are communicated by the common data bus (CDB). The execution stage (EX) does both the effective address calculation and the memory access for loads and stores. Thus, the pipeline is IF/ID/IS/EX/WB. Loads require one clock cycle. The issue (IS) and write-back (WB) result stages each require one clock cycle. There are five load buffer slots and five store buffer slots. Assume that the Branch on Not Equal to Zero (BNEZ) instruction requires one clock cycle. a. [20] <3.43.5> For this problem use the single-issue Tomasulo MIPS pipeline of Figure 3.6 with the pipeline latencies from the table above. Show the number of stall cycles for each instruction and what clock cycle each instruction begins execution (i.e., enters its first EX cycle) for three iterations of the loop. How many cycles does each loop iteration take? Report your answer in the form of a table with the following column headers: Iteration (loop iteration number) Instruction Issues (cycle when instruction issues) Executes (cycle when instruction executes) FU Type Cycles in EX Number of FUs Number of reservation stations Integer 1 1 5 FP adder 10 1 3 FP multiplier 15 1 2 258 Chapter Three Instruction-Level Parallelism and Its Exploitation Memory access (cycle when memory is accessed) Write CDB (cycle when result is written to the CDB) Comment (description of any event on which the instruction is waiting) Show three iterations of the loop in your table. You may ignore the first instruction. b. [20] <3.7, 3.8> Repeat part (a) but this time assume a two-issue Tomasulo algorithm and a fully pipelined floating-point unit (FPU).
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started