Question: undefined Q1. Use the following code fragment: Loop: lw x7, 0(x1) lw x8, 0(x2) add x5, x7, x8 SW x5, 0(x2) addi x1, x1,4 addi
undefined
Q1. Use the following code fragment: Loop: lw x7, 0(x1) lw x8, 0(x2) add x5, x7, x8 SW x5, 0(x2) addi x1, x1,4 addi x2, x2,4 sub x6, x3, x2 bnez x6, Loop Assume that the initial value of x3 is x2 + 256. Assume early evaluation of branch instruction, i.e., the branch outcome (whether the condition is true or false and where is the next instruction) is known after the Decode stage., but the branch instruction will still go through all the five pipeline stages. For both Q1(a) and Q1(b), assume that the branch is handled by predicting it as not taken, i.e. the next instruction in program sequence is fetched, which is a wrong instruction except for the last iteration and will be flushed after the branch outcome is known. (a) [3 pts] Show the timing of this instruction sequence for the 5-stage RISC pipeline without any forwarding or bypassing hardware but assuming that a register read and a write in the same clock cycle forwards through the register file, as between the add and or shown in Figure C.5. Use a pipeline timing chart like that in Figure C.8. If all memory references take 1 cycle, how many cycles does this loop take to execute? (b) [3 pts] Show the timing of this instruction sequence for the 5-stage RISC pipeline with full forwarding and bypassing hardware. Use a pipeline timing chart like that shown in Figure C.8. If all memory references take 1 cycle, how many cycles does this loop take to execute? Q2. We begin with a computer implemented in single-cycle implementation. When the stages are split by functionality, the stages do not require exactly the same amount of time. The original machine had a clock cycle time of 8 ns. After the stages were split, the measured times were IF, 1 ns; ID, 2 ns; EX, 1 ns; MEM, 2 ns; and WB, 2 ns. The pipeline register delay is 0.05 ns. (a) [1 pt] What is the clock cycle time of the 5-stage pipelined machine? (b) [1 pt] If there is a stall every four instructions, what is the CPI of the new machine? () [1 pt] What is the speedup of the pipelined machine over the single-cycle machine? (d) [1 pt] If the pipelined machine had an infinite number of stages, what would its speedup be over the single-cycle machine? Q1. Use the following code fragment: Loop: lw x7, 0(x1) lw x8, 0(x2) add x5, x7, x8 SW x5, 0(x2) addi x1, x1,4 addi x2, x2,4 sub x6, x3, x2 bnez x6, Loop Assume that the initial value of x3 is x2 + 256. Assume early evaluation of branch instruction, i.e., the branch outcome (whether the condition is true or false and where is the next instruction) is known after the Decode stage., but the branch instruction will still go through all the five pipeline stages. For both Q1(a) and Q1(b), assume that the branch is handled by predicting it as not taken, i.e. the next instruction in program sequence is fetched, which is a wrong instruction except for the last iteration and will be flushed after the branch outcome is known. (a) [3 pts] Show the timing of this instruction sequence for the 5-stage RISC pipeline without any forwarding or bypassing hardware but assuming that a register read and a write in the same clock cycle forwards through the register file, as between the add and or shown in Figure C.5. Use a pipeline timing chart like that in Figure C.8. If all memory references take 1 cycle, how many cycles does this loop take to execute? (b) [3 pts] Show the timing of this instruction sequence for the 5-stage RISC pipeline with full forwarding and bypassing hardware. Use a pipeline timing chart like that shown in Figure C.8. If all memory references take 1 cycle, how many cycles does this loop take to execute? Q2. We begin with a computer implemented in single-cycle implementation. When the stages are split by functionality, the stages do not require exactly the same amount of time. The original machine had a clock cycle time of 8 ns. After the stages were split, the measured times were IF, 1 ns; ID, 2 ns; EX, 1 ns; MEM, 2 ns; and WB, 2 ns. The pipeline register delay is 0.05 ns. (a) [1 pt] What is the clock cycle time of the 5-stage pipelined machine? (b) [1 pt] If there is a stall every four instructions, what is the CPI of the new machine? () [1 pt] What is the speedup of the pipelined machine over the single-cycle machine? (d) [1 pt] If the pipelined machine had an infinite number of stages, what would its speedup be over the single-cycle machine
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
