Exercise 4 31 Problems in this exercise refer to the following loop, which is given as x86 code and also as a MIPS translation of that code You can assume that this loop executes many iterations before it exits When determining performance, this means that you only need to determine what the performance would be in the steady state, not for the fi rst few and the last few iterations of the loop Also, you can assume full forwarding support and perfect branch prediction without delay slots, so the only hazards you have to worry about are resource hazards and data hazards Note that most x86 instructions in this problem have two operands each The last (usually second) operand of the instruction indicates both the fi rst source data value and the destination If the operation needs a second source data value, it is indicated by the other operand of the instruction For example, sub (edx),eax reads the memory location pointed by register edx, subtracts that value from register eax, and puts the result back in register eax x86 Instructions MIPS like translation a Label mov 4(esp), eax add (edx), eax mov eax, 4(esp) add 1, ecx add 4, edx cmp esi, ecx jl Label Label lw $2,4($sp) lw $3,0($4) add $2,$2,$3 sw $2, 4($sp) addi $6,$6,1 addi $4,$4,4 slt $1,$6,$5 bne $1,$0,Label b Label add eax, (edx) mov eax, edx add 1, eax jl Label Label lw $2,0($4) add $2,$2,$5 sw $2,0($4) add $4,$5,$0 addi $5,$5,1 slt $1,$5,$0 bne $1,$0,Label 4 31 1 20 4 11 What CPI would be achieved if the MIPS version of this loop is executed on a 1 issue processor with static scheduling and a fi ve stage pipeline 4 31 2 20 4 11 What CPI would be achieved if the x86 version of this loop is executed on a 1 issue processor with static scheduling and a 7 stage pipeline The stages of the pipeline are IF, ID, ARD, MRD, EXE, and WB Stages IF and ID are similar to those in the fi ve stage MIPS pipeline ARD computes the address of the memory location to be read, MRD performs the memory read, EXE executes the operation, and WB writes the result to register or memory The data memory has a read port (for instructions in the MRD stage) and a separate write port (for instructions in the WB stage) 4 31 3 20 4 11 What CPI would be achieved if the x86 version of this loop is executed on a processor that internally translates these instructions into MIPSlike micro operations, then executes these micro operations on a 1 issue fi vestage pipeline with static scheduling Note that the instruction count used in CPI computation for this processor is the x86 instruction count 4 31 4 20 4 11 What CPI would be achieved if the MIPS version of this loop is executed on a 1 issue processor with dynamic scheduling Assume that our processor is not doing register renaming, so you can only reorder instructions that have no data dependences 4 31 5 30 4 10, 4 11 Assuming that there are many free registers available, rename the MIPS version of this loop to eliminate as many data dependences as possible between instructions in the same iteration of the loop Now repeat Exercise 4 31 4, using your new renamed code 4 31 6 20 4 10, 4 11 Repeat Exercise 4 31 4, but this time assume that the processor assigns a new name to the result of each instruction as that instruction is decoded, and then renames registers used by subsequent instructions to use correct register values

Question

Exercise 4 31 Problems in this exercise refer to the following loop, which is given as x86 code and also as a MIPS translation of that code  You can assume that this loop executes many iterations before it exits  When determining performance, this means that you only need to determine what the performance would be in the steady state, not for the fi rst few and the last few iterations of the loop  Also, you can assume full forwarding support and perfect branch prediction without delay slots, so the only hazards you have to worry about are resource hazards and data hazards  Note that most x86 instructions in this problem have two operands each  The last (usually second) operand of the instruction indicates both the fi rst source data value and the destination  If the operation needs a second source data value, it is indicated by the other operand of the instruction  For example, sub (edx),eax reads the memory location pointed by register edx, subtracts that value from register eax, and puts the result back in register eax  x86 Instructions MIPS like translation a  Label  mov  4(esp), eax add (edx), eax mov eax,  4(esp) add 1, ecx add 4, edx cmp esi, ecx jl Label Label  lw $2,4($sp) lw $3,0($4) add $2,$2,$3 sw $2, 4($sp) addi $6,$6,1 addi $4,$4,4 slt $1,$6,$5 bne $1,$0,Label b  Label  add eax, (edx) mov eax, edx add 1, eax jl Label Label  lw $2,0($4) add $2,$2,$5 sw $2,0($4) add $4,$5,$0 addi $5,$5,1 slt $1,$5,$0 bne $1,$0,Label 4 31 1  20   4 11  What CPI would be achieved if the MIPS version of this loop is executed on a 1 issue processor with static scheduling and a fi ve stage pipeline  4 31 2  20   4 11  What CPI would be achieved if the x86 version of this loop is executed on a 1 issue processor with static scheduling and a 7 stage pipeline  The stages of the pipeline are IF, ID, ARD, MRD, EXE, and WB  Stages IF and ID are similar to those in the fi ve stage MIPS pipeline  ARD computes the address of the memory location to be read, MRD performs the memory read, EXE executes the operation, and WB writes the result to register or memory  The data memory has a read port (for instructions in the MRD stage) and a separate write port (for instructions in the WB stage)  4 31 3  20   4 11  What CPI would be achieved if the x86 version of this loop is executed on a processor that internally translates these instructions into MIPSlike micro operations, then executes these micro operations on a 1 issue fi vestage pipeline with static scheduling  Note that the instruction count used in CPI computation for this processor is the x86 instruction count  4 31 4  20   4 11  What CPI would be achieved if the MIPS version of this loop is executed on a 1 issue processor with dynamic scheduling  Assume that our processor is not doing register renaming, so you can only reorder instructions that have no data dependences  4 31 5  30   4 10, 4 11  Assuming that there are many free registers available, rename the MIPS version of this loop to eliminate as many data dependences as possible between instructions in the same iteration of the loop  Now repeat Exercise 4 31 4, using your new renamed code  4 31 6  20   4 10, 4 11  Repeat Exercise 4 31 4, but this time assume that the processor assigns a new name to the result of each instruction as that instruction is decoded, and then renames registers used by subsequent instructions to use correct register values

SolutionInn · Accepted Answer

The Answer is in the image, click to view ...

Question: Exercise 4.31 Problems in this exercise refer to the following loop, which is given as x86 code and also as a MIPS translation of that

Step by Step Solution