Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The following code works on 2 arrays of C and D each of which has length of M. There are 32-bit floating point numbers in

The following code works on 2 arrays of C and D each of which has length of M. There are 32-bit floating point numbers in each of the arrays. for (i = 0; i < M; i++) C[i] = C[i] * (D[i] + 3.0); At the end of compile operation, following instructions are created : ;; f1 := 3.0 ;; z1 := &C[0] and z2 = &D[0] ;; z3 := &C[M] L1: loop1: l.s f0, 0(z2) ;; Load D[i] L2: l.s f2, 0(z1) ;; Load C[i] L3: fadd f3, f0, f1 L4: addi z1, z1, 4L5: fmul f4, f2, f3 L6: addi z2, z2, 4 L7: s.s f4, -4(z1) ;; Store C[i] L8: bne z1, z3, loop1

This code is executed on a pipelined machine with perfect branch prediction sequentially. The latencies caused by instructions are as follows: - Each ALU calculation causes 1 cycle of delay (sequential ALU instructions are executed with no stalls by means of by passing technique) - Each LD instruction causes 1 cycles of delay - Each floating point instruction causes 3 cycles of delay - Branch instruction causes 1 cycle of delay 1) What is the total number of stalls the processor will run in each iteration of loop? Explain in short. 2) What is the average number of floating-point operations in each cycle the processor will execute in the stable state?

The following code works on 2 arrays of C and D each of which has length of M. There are 32-bit floating point numbers in each of the arrays. for (i = 0; i < M; i++) C[i] = C[i] * (D[i] + 3.0); At the end of compile operation, following instructions are created : ;; f1 := 3.0 ;; z1 := &C[0] and z2 = &D[0] ;; z3 := &C[M] L1: loop1: l.s f0, 0(z2) ;; Load D[i] L2: l.s f2, 0(z1) ;; Load C[i] L3: fadd f3, f0, f1 L4: addi z1, z1, 4L5: fmul f4, f2, f3 L6: addi z2, z2, 4 L7: s.s f4, -4(z1) ;; Store C[i] L8: bne z1, z3, loop1

This code is executed on a pipelined machine with perfect branch prediction sequentially. The latencies caused by instructions are as follows: - Each ALU calculation causes 1 cycle of delay (sequential ALU instructions are executed with no stalls by means of by passing technique) - Each LD instruction causes 1 cycles of delay - Each floating point instruction causes 3 cycles of delay - Branch instruction causes 1 cycle of delay 1) What is the total number of stalls the processor will run in each iteration of loop? Explain in short. 2) What is the average number of floating-point operations in each cycle the processor will execute in the stable state?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning VB 2008 Databases

Authors: Vidya Vrat Agarwal, James Huddleston

1st Edition

1590599470, 978-1590599471

More Books

Students also viewed these Databases questions

Question

How do Excel Pivot Tables handle data from non OLAP databases?

Answered: 1 week ago