Question: Exercise 4.31 Problems in this exercise refer to the following loop, which is given as x86 code and also as a MIPS translation of that

Exercise 4.31 Problems in this exercise refer to the following loop, which is given as x86 code and also as a MIPS translation of that code. You can assume that this loop executes many iterations before it exits. When determining performance, this means that you only need to determine what the performance would be in the “steady state”, not for the fi rst few and the last few iterations of the loop. Also, you can assume full forwarding support and perfect branch prediction without delay slots, so the only hazards you have to worry about are resource hazards and data hazards. Note that most x86 instructions in this problem have two operands each. The last (usually second) operand of the instruction indicates both the fi rst source data value and the destination. If the operation needs a second source data value, it is indicated by the other operand of the instruction. For example, “sub (edx),eax” reads the memory location pointed by register edx, subtracts that value from register eax, and puts the result back in register eax.

x86 Instructions MIPS-like translation

a. Label: mov -4(esp), eax add (edx), eax mov eax, -4(esp)

add 1, ecx add 4, edx cmp esi, ecx jl Label Label: lw $2,–4($sp)

lw $3,0($4)

add $2,$2,$3 sw $2,-4($sp)

addi $6,$6,1 addi $4,$4,4 slt $1,$6,$5 bne $1,$0,Label

b. Label: add eax, (edx)

mov eax, edx add 1, eax jl Label Label: lw $2,0($4)

add $2,$2,$5 sw $2,0($4)

add $4,$5,$0 addi $5,$5,1 slt $1,$5,$0 bne $1,$0,Label 4.31.1 [20] <4.11> What CPI would be achieved if the MIPS version of this loop is executed on a 1-issue processor with static scheduling and a fi ve-stage pipeline?

4.31.2 [20] <4.11> What CPI would be achieved if the x86 version of this loop is executed on a 1-issue processor with static scheduling and a 7-stage pipeline?

The stages of the pipeline are IF, ID, ARD, MRD, EXE, and WB. Stages IF and ID are similar to those in the fi ve-stage MIPS pipeline. ARD computes the address of the memory location to be read, MRD performs the memory read, EXE executes the operation, and WB writes the result to register or memory. The data memory has a read port (for instructions in the MRD stage) and a separate write port (for instructions in the WB stage).
4.31.3 [20] <4.11> What CPI would be achieved if the x86 version of this loop is executed on a processor that internally translates these instructions into MIPSlike micro-operations, then executes these micro-operations on a 1-issue fi vestage pipeline with static scheduling. Note that the instruction count used in CPI computation for this processor is the x86 instruction count.
4.31.4 [20] <4.11> What CPI would be achieved if the MIPS version of this loop is executed on a 1-issue processor with dynamic scheduling? Assume that our processor is not doing register renaming, so you can only reorder instructions that have no data dependences.
4.31.5 [30] <4.10, 4.11> Assuming that there are many free registers available, rename the MIPS version of this loop to eliminate as many data dependences as possible between instructions in the same iteration of the loop. Now repeat Exercise 4.31.4, using your new renamed code.
4.31.6 [20] <4.10, 4.11> Repeat Exercise 4.31.4, but this time assume that the processor assigns a new name to the result of each instruction as that instruction is decoded, and then renames registers used by subsequent instructions to use correct register values.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock