[Solved] Question 2. Branch-prediction buffers Que

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 21, 2024

Question 2. Branch-prediction buffers Question 2 from Homework #4 referenced in the question above: Question 2. Branch-prediction buffers [Note: This question builds on Question 2

Question 2. Branch-prediction buffers

image text in transcribed

Question 2 from Homework #4 referenced in the question above:

image text in transcribed

Question 2. Branch-prediction buffers [Note: This question builds on Question 2 from HW #4. It uses the same code mix and architecture.) Suppose that in a certain mix of code: 16% of instructions are conditional branches. 60% of conditional branches are taken. 1% of instructions are jumps (unconditional branches, always taken). The average CPI of non-branch instructions is 1.2. A heavily pipelined architecture with fourteen pipeline stages calculates branch addresses (for both unconditional and conditional branches) in the third stage, storing the address in the pipeline stage register between the third and fourth stages before it can be used. It calculates the condition for conditional branches in the tenth stage, storing the condition bit in the pipeline stage register between the tenth and eleventh stages before it can be used. Suppose that I add a branch-prediction buffer that uses one-bit predictors. The buffer is accessed during the first stage using the bottom 8 bits of the PC as an index and provides a prediction of the condition bit for conditional branches. (The buffer predicts that the branch behaves the same way it did last time it was run.) The buffer gives a correct prediction 74% of the time. a Draw a diagram showing the basic structure of the branch-prediction buffer. How many bits of memory total are needed to implement the buffer? (Count flip-flops in state machines as bits of memory.) What is the CPI of the code when the branch-prediction buffer is used, and what is the speed-up relative to the predict-not-taken strategy from HW#4, question 2(b)? Please note that you cannot use a predicted condition until you have the target address. Suppose that I upgrade to a (2.3) correlating branch-prediction buffer. The buffer is accessed during the first stage using the bottom 8 bits of the PC as an index and provides a prediction of the condition bit for conditional branches. The buffer gives a correct prediction 90% of the time. c. Draw a diagram showing the basic structure of the branch-prediction buffer. How many bits of memory total are needed to implement the buffer? (Count flip-flops in state machines as bits of memory.) d. What is the CPI of the code when the branch-prediction buffer is used, and what is the speed-up relative to the predict-not-taken strategy from HW#4, question 2(b)? Please note that you cannot use a predicted condition until you have the target address. Question 2. Branch delay on a heavily pipelined architecture Sometimes, architectures are very heavily pipelined, to get a fast clock cycle time. I might not mind a higher CPI, if the clock cycle time is very fast! Suppose that in a certain mix of code: 16% of instructions are conditional branches. 60% of conditional branches are taken. 1% of instructions are jumps (unconditional branches, always taken). The average CPI of non-branch instructions is 1.2. A heavily pipelined architecture with fourteen pipeline stages calculates branch addresses (for both unconditional and conditional branches) in the third stage, storing the address in the pipeline stage register between the third and fourth stages before it can be used. It calculates the condition for conditional branches in the tenth stage, storing the condition bit in the pipeline stage register between the ninth and tenth stages before it can be used. (Throughout, be sure you clearly state how many cycles of stall are needed for conditional and unconditional branches, respectively Draw a pipeline stage diagram, and think about when exactly you can update the PC in each situation.) a) If the architecture uses a freeze-the-pipeline strategy, what is the CPI? b) If the architecture uses a predict-not-taken strategy, what is the CPI, and what is the speed-up relative to the freeze-the pipeline architecture? c) If the architecture uses a predict-taken strategy, what is the CPI, and what is the speed-up relative to the freeze-the pipeline architecture? (Note that you still need the branch address before you can branch, even if you predict-taken.) d) Branch predictors are pieces of hardware that predict the condition bit of a branch based on run-time conditions at the time the branch starts executing. What is the theoretical) best CPI that could be achieved if the machine were able to use a branch predictor to perfectly predict whether a branch is taken? What is the corresponding speed-up relative to the freeze-the-pipeline architecture? Question 2. Branch-prediction buffers [Note: This question builds on Question 2 from HW #4. It uses the same code mix and architecture.) Suppose that in a certain mix of code: 16% of instructions are conditional branches. 60% of conditional branches are taken. 1% of instructions are jumps (unconditional branches, always taken). The average CPI of non-branch instructions is 1.2. A heavily pipelined architecture with fourteen pipeline stages calculates branch addresses (for both unconditional and conditional branches) in the third stage, storing the address in the pipeline stage register between the third and fourth stages before it can be used. It calculates the condition for conditional branches in the tenth stage, storing the condition bit in the pipeline stage register between the tenth and eleventh stages before it can be used. Suppose that I add a branch-prediction buffer that uses one-bit predictors. The buffer is accessed during the first stage using the bottom 8 bits of the PC as an index and provides a prediction of the condition bit for conditional branches. (The buffer predicts that the branch behaves the same way it did last time it was run.) The buffer gives a correct prediction 74% of the time. a Draw a diagram showing the basic structure of the branch-prediction buffer. How many bits of memory total are needed to implement the buffer? (Count flip-flops in state machines as bits of memory.) What is the CPI of the code when the branch-prediction buffer is used, and what is the speed-up relative to the predict-not-taken strategy from HW#4, question 2(b)? Please note that you cannot use a predicted condition until you have the target address. Suppose that I upgrade to a (2.3) correlating branch-prediction buffer. The buffer is accessed during the first stage using the bottom 8 bits of the PC as an index and provides a prediction of the condition bit for conditional branches. The buffer gives a correct prediction 90% of the time. c. Draw a diagram showing the basic structure of the branch-prediction buffer. How many bits of memory total are needed to implement the buffer? (Count flip-flops in state machines as bits of memory.) d. What is the CPI of the code when the branch-prediction buffer is used, and what is the speed-up relative to the predict-not-taken strategy from HW#4, question 2(b)? Please note that you cannot use a predicted condition until you have the target address. Question 2. Branch delay on a heavily pipelined architecture Sometimes, architectures are very heavily pipelined, to get a fast clock cycle time. I might not mind a higher CPI, if the clock cycle time is very fast! Suppose that in a certain mix of code: 16% of instructions are conditional branches. 60% of conditional branches are taken. 1% of instructions are jumps (unconditional branches, always taken). The average CPI of non-branch instructions is 1.2. A heavily pipelined architecture with fourteen pipeline stages calculates branch addresses (for both unconditional and conditional branches) in the third stage, storing the address in the pipeline stage register between the third and fourth stages before it can be used. It calculates the condition for conditional branches in the tenth stage, storing the condition bit in the pipeline stage register between the ninth and tenth stages before it can be used. (Throughout, be sure you clearly state how many cycles of stall are needed for conditional and unconditional branches, respectively Draw a pipeline stage diagram, and think about when exactly you can update the PC in each situation.) a) If the architecture uses a freeze-the-pipeline strategy, what is the CPI? b) If the architecture uses a predict-not-taken strategy, what is the CPI, and what is the speed-up relative to the freeze-the pipeline architecture? c) If the architecture uses a predict-taken strategy, what is the CPI, and what is the speed-up relative to the freeze-the pipeline architecture? (Note that you still need the branch address before you can branch, even if you predict-taken.) d) Branch predictors are pieces of hardware that predict the condition bit of a branch based on run-time conditions at the time the branch starts executing. What is the theoretical) best CPI that could be achieved if the machine were able to use a branch predictor to perfectly predict whether a branch is taken? What is the corresponding speed-up relative to the freeze-the-pipeline architecture