Question: Exercise 4.29 In this exercise, we consider the execution of a loop in a statically scheduled superscalar processor. To simplify the exercise, assume that any

Exercise 4.29 In this exercise, we consider the execution of a loop in a statically scheduled superscalar processor. To simplify the exercise, assume that any combination of instruction types can execute in the same cycle, e.g., in a 3-issue superscalar, the three instructions can be three ALU operations, three branches, three load/store instruction, or any combination of these instructions. Note that this only removes a resource constraint, but data and control dependences must still be handled correctly. Problems in this exercise refer to the following loop:

Loop

a. Loop: lw $1,40($6)

add $5,$5,$1 sw $1,20($5)

addi $6,$6,4 addi $5,$5,–4 beq $5,$0,Loop

b. Loop: add $1,$2,$3 sw $0,0($1)

addi $2,$2,4 beq $2,$0,Loop 4.29.1 [10] <4.10> If many (e.g., 1,000,000) iterations of this loop are executed, determine the fraction of all register reads that are useful in a 2-issue static superscalar processor?

4.29.2 [10] <4.10> If many (e.g., 1,000,000) iterations of this loop are executed, determine the fraction of all register reads that are useful in a 3-issue static superscalar processor? Compare this to your result for a 2-issue processor from Exercise 4.29.1.

4.29.3 [10] <4.10> If many (e.g., 1,000,000) iterations of this loop are executed, determine the fraction of cycles in which two or three register write ports are used in a 3-issue static superscalar processor?

4.29.4 [20] <4.10> Unroll this loop once and schedule it for a 2-issue static superscalar processor. Assume that the loop always executes an even number of iterations. You can use registers $10 through $20 when changing the code to eliminate dependences.
4.29.5 [20] <4.10> What is the speed-up of using your code from Exercise 4.29.4 instead of the original code with a 2-issue static superscalar processor. Assume that the loop has many (e.g., 1,000,000) iterations.
4.29.6 [10] <4.10> What is the speed-up of using your code from Exercise 4.29.4 instead of the original code with a pipelined (1-issue) processor. Assume that the loop has many (e.g., 1,000,000) iterations.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock