Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Question 2. Branch delay on a heavily pipelined architecture Question 2. Branch delay on a heavily pipelined architecture Sometimes, architectures are very heavily pipelined, to
Question 2. Branch delay on a heavily pipelined architecture
Question 2. Branch delay on a heavily pipelined architecture Sometimes, architectures are very heavily pipelined, to get a fast clock cycle time. I might not mind a higher CPI, if the clock cycle time is very fast! Suppose that in a certain mix of code: 16% of instructions are conditional branches. 60% of conditional branches are taken. 1% of instructions are jumps (unconditional branches, always taken). The average CPI of non-branch instructions is 1.2. A heavily pipelined architecture with fourteen pipeline stages calculates branch addresses (for both unconditional and conditional branches) in the third stage, storing the address in the pipeline stage register between the third and fourth stages before it can be used. It calculates the condition for conditional branches in the tenth stage, storing the condition bit in the pipeline stage register between the ninth and tenth stages before it can be used. (Throughout, be sure you clearly state how many cycles of stall are needed for conditional and unconditional branches, respectively Draw a pipeline stage diagram, and think about when exactly you can update the PC in each situation.) a) If the architecture uses a freeze-the-pipeline strategy, what is the CPI? b) If the architecture uses a predict-not-taken strategy, what is the CPI, and what is the speed-up relative to the freeze-the pipeline architecture? c) If the architecture uses a predict-taken strategy, what is the CPI, and what is the speed-up relative to the freeze-the pipeline architecture? (Note that you still need the branch address before you can branch, even if you predict-taken.) d) Branch predictors are pieces of hardware that predict the condition bit of a branch based on run-time conditions at the time the branch starts executing. What is the theoretical) best CPI that could be achieved if the machine were able to use a branch predictor to perfectly predict whether a branch is taken? What is the corresponding speed-up relative to the freeze-the-pipeline architecture? Question 2. Branch delay on a heavily pipelined architecture Sometimes, architectures are very heavily pipelined, to get a fast clock cycle time. I might not mind a higher CPI, if the clock cycle time is very fast! Suppose that in a certain mix of code: 16% of instructions are conditional branches. 60% of conditional branches are taken. 1% of instructions are jumps (unconditional branches, always taken). The average CPI of non-branch instructions is 1.2. A heavily pipelined architecture with fourteen pipeline stages calculates branch addresses (for both unconditional and conditional branches) in the third stage, storing the address in the pipeline stage register between the third and fourth stages before it can be used. It calculates the condition for conditional branches in the tenth stage, storing the condition bit in the pipeline stage register between the ninth and tenth stages before it can be used. (Throughout, be sure you clearly state how many cycles of stall are needed for conditional and unconditional branches, respectively Draw a pipeline stage diagram, and think about when exactly you can update the PC in each situation.) a) If the architecture uses a freeze-the-pipeline strategy, what is the CPI? b) If the architecture uses a predict-not-taken strategy, what is the CPI, and what is the speed-up relative to the freeze-the pipeline architecture? c) If the architecture uses a predict-taken strategy, what is the CPI, and what is the speed-up relative to the freeze-the pipeline architecture? (Note that you still need the branch address before you can branch, even if you predict-taken.) d) Branch predictors are pieces of hardware that predict the condition bit of a branch based on run-time conditions at the time the branch starts executing. What is the theoretical) best CPI that could be achieved if the machine were able to use a branch predictor to perfectly predict whether a branch is taken? What is the corresponding speed-up relative to the freeze-the-pipeline architectureStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access with AI-Powered Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started