Question
The following code works on 2 arrays of C and D each of which has length of M. There are 32-bit floating point numbers in
The following code works on 2 arrays of C and D each of which has length of M.
There are 32-bit floating point numbers in each of the arrays.
for (i = 0; i < M; i++)
C[i] = C[i] * (D[i] + 3.0);
At the end of compile operation, following instructions are created :
;; f1 := 3.0
;; z1 := &C[0] and z2 = &D[0]
;; z3 := &C[M]
L1: loop1: l.s f0, 0(z2) ;; Load D[i]
L2: l.s f2, 0(z1) ;; Load C[i]
L3: fadd f3, f0, f1
L4: addi z1, z1, 4
L5: fmul f4, f2, f3
L6: addi z2, z2, 4
L7: s.s f4, -4(z1) ;; Store C[i]
L8: bne z1, z3, loop1
This code is executed on a pipelined machine with perfect branch prediction sequentially. The latencies caused by instructions are as follows:
- Each ALU calculation causes 1 cycle of delay (sequential ALU instructions are executed with no stalls by means of by passing technique)
- Each LD instruction causes 1 cycles of delay
- Each floating point instruction causes 3 cycles of delay
- Branch instruction causes 1 cycle of delay
a) What is the total number of stalls the processor will run in each iteration of loop? Explain in short.
b) What is the average number of floating-point operations in each cycle the processor will execute in the stable state?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started