Questions and Answers of Computer Organization Design

Perhaps the most likely case of adding many numbers at once in a computer would be when trying to multiply more quickly by using many adders to add many numbers in a single clock cycle. Compared to
Assign state numbers to the states in the traffic light example of Exercise B.41 and use the tables of Exercise B.42 to write a set of logic equations for each of the outputs, including the
Assume we want to execute the DAXPY loop show on page 511 in MIPS assembly on the NVIDIA 8800 GTX GPU described in this chapter. In this problem, we will assume that all math operations are performed
Section A.5 described how memory is partitioned on most MIPS systems. Propose another way of dividing memory that meets the same goals.
Rewrite the code for fact to use fewer instructions.
Section A.7 contains code for a very simple exception handler. One serious problem with this handler is that it disables interrupts for a long time. This means that interrupts from a fast I/O device
Using SPIM, write and test an adding machine program that repeatedly reads in integers and adds them into a running sum. The program should stop when it gets an input that is 0, printing out the sum
Using SPIM, write and test a program that reads in three integers and prints out the sum of the largest two of the three. Use the SPIM system calls described on pages A-43 and A-45. You can break
Using SPIM, write and test a recursive program for solvingthe classic mathematical recreation, the Towers of Hanoi puzzle. (This will require the use of stack frames to support recursion.) The puzzle
Now calculate the relative performance of adders. Assume that hardware corresponding to any equation containing only OR or AND terms, such as the equations for pi and gi on page B-40, takes one time
This exercise is similar to Exercise B.28, but this time calculate the relative speeds of a 16-bit adder using ripple carry only, ripple carry of 4-bit groups that use carry lookahead, and the
This exercise is similar to Exercises B.28 and B.29, but this time calculate the relative speeds of a 64-bit adder using ripple carry only, ripple carry of 4-bit groups that use carry lookahead,
We wish to add a yellow light to our traffic light example on page B-68. We will do this by changing the clock to run at 0.25 Hz (a 4-second clock cycle time), which is the duration of a yellow
We want to emulate vectored exception handling on a machine that has only one fixed handler address. Write the code that should be at that fixed address.The remaining three problems in this exercise
For each of these signals, identify the pipeline stage in which it is generated and the stage in which it is used.The remaining problems in this exercise refer to the following signals from Figure
What is the performance (in instructions per second)?Problems in this exercise assume that, during an execution of the program, processor cycles are spent in the following way. A cycle is "spent" on
Indicate dependences and their type.In this exercise, we examine how data dependences affect execution in the basic 5-stage pipeline described in Section 4.5. Problems in this exercise refer to the
Repeat 4.8.1 for a stuck-at-1 fault. Can you use a single test for both stuck-at-0 and stuck-at-1? If yes, explain how; if no, explain why not.Problem 4.8.1Let us assume that processor testing is
Which new functional blocks (if any) do we need for this instruction?The basic single-cycle MIPS implementation in Figure 4.2 can only implement some instructions. New instructions can be added to an
What is the worst-case MIPS instruction in terms of energy consumption, and what is the energy spent to execute it?This exercise explores energy efficiency and its relationship with performance.
Repeat 4.21.1 but now use NOPs only when a hazard cannot be avoided by changing or rearranging these instructions. You can assume register R7 can be used to hold temporary values in your modified
If many (e.g., 1,000,000) iterations of this loop are executed, determine the fraction of all register reads that are useful in a 3-issue static superscalar processor. Compare this to your result for
What is the register number supplied to the register file’s “Read register 1” input? Is this register actually read? How about “Read register 2”?In this exercise we examine the operation of
Show how this block can be implemented. Use only AND, OR, NOT, and D Flip-Flops.Problems in this exercise refer to the following logic block: Logic Block a. Small Multiplexor (Mux) with four 8-bit
Assume there is no forwarding in this pipelined processor. Indicate hazards and add NOP instructions to eliminate them.In this exercise, we examine how data dependences affect execution in the basic
What is the power dissipated in watts (joules per second)?Problems in this exercise assume that, during an execution of the program, processor cycles are spent in the following way. A cycle is
Repeat 4.22.1, but assume that delay slots are used. In the given code, the instruction that follows the branch is now the delay slot instruction for that branch.Exercise 4.22.1Draw the pipeline
Assuming that all gates have equal latencies, what is the length (in gates) of the critical path in your circuit from 4.4.1?Problem from 4.4.1Implement the logic for the Control signal 1. Your
Which control signal in Figure 4.24 has the most slack and how much time does the control unit have to generate it if it wants to avoid being on the critical path?In this exercise we examine how the
In a 2-issue static superscalar whose predictor can only handle one branch per cycle, what speedup is achieved by adding the ability to predict two branches per cycle? Assume a stall-on-branch policy
For this problem, assume that all branches are perfectly predicted (this eliminates all control hazards) and that no delay slots are used. If we change load/store instructions to use a register
Repeat 4.23.1 for the “always-not-taken” predictor.Exercise 4.23.1Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to mispredicted branches with the
What CPI would be achieved if the X86 version of this loop is executed on a 1-issue processor with static scheduling and a 7-stage pipeline? The stages of the pipeline are IF, ID, ARD, MRD, EXE, and
What is the accuracy of the two-bit predictor for the first 4 branches in this pattern, assuming that the predictor starts off in the bottom left state from Figure 4.63 (predict not taken)?This
Repeat 4.5.1, but now design a circuit that accomplishes this operation 2 bits at a time.Problem 4.5.1Design a circuit with 1-bit data inputs and a 1-bit data output that accomplishes this operation
Which new control signals must be added to your pipeline from 4.15.1?Problem 4.15.1What must be changed in the pipelined datapath to add this instruction to the MIPS ISA?In this exercise, we examine
If there are no branch mispredictions and no data dependences, what is the expected performance improvement over a 1-issue processor with the classical 5-stage pipeline? Assume that the clock cycle
Which resources (blocks) produce outputs, but their outputs are not used for this instruction? Which resources produce no outputs for this instruction?Different instructions utilize different
We can make the EX stage faster if we check for exceptions in the stage after the one in which the exceptional condition occurs. Using this instruction as an example, describe the main disadvantage
If we want to add this instruction to the MIPS ISA, discuss the changes to the pipeline (which stages, which structures in which stage) that are needed to directly (without micro-ops) support this
Repeat 4.6.2, but this time we need to support only conditional PC-relative branches.Problem 4.6.2Consider a datapath similar to the one in Figure 4.11, but for a processor that only has one type of
What is the value of the PCSrc signal for this instruction? This signal is generated early in the MEM stage (only a single AND gate). What would be a reason in favor of doing this in the EX stage?
What is the value in the EPC if the branch is taken but the delay slot causes an exception? What happens after the execution of the exception handler is completed?This exercise examines how exception
Rearrange your code from 4.28.1 to achieve better performance on a 2-issue statically scheduled processor from Figure 4.69.Exercise 4.28.1Translate this C code into MIPS instructions. Your
What is the new PC address after this instruction is executed? Highlight the path through which this value is determined.In this exercise we examine in detail how an instruction is executed in a
Assume that we already have a single-cycle design. How many bits in total do we need for pipeline registers to implement the pipelined design?This exercise explores some of the tradeoffs involved in
What is the clock cycle time if we must support ADD, BEQ, LW, and SW instructions?In this exercise we examine how latencies of individual components of the datapath affect the clock cycle time of the
Let us assume that we cannot afford to have three-input Muxes that are needed for full forwarding. We have to decide if it is better to forward only from the EX/MEM pipeline register (next-cycle
If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the
If we know that the processor has a stuck-at-1 fault on this signal, is the processor still usable? To be usable, we must be able to convert any program that executes on a normal MIPS processor into
What new signals do we need (if any) from the control unit to support this instruction?The basic single-cycle MIPS implementation in Figure 4.2 can only implement some instructions. New instructions
To reduce clock cycle time, we are considering a split of the MEM stage into two stages. Repeat 4.20.2 for this 6-stage pipeline.Problem 4.20.2Find all hazards in this instruction sequence for a
If energy reduction is paramount, how would you change the pipelined design? What is the percentage reduction in the energy spent by an LW instruction after this change?This exercise explores energy
If the processor has forwarding, but we forgot to implement the hazard detection unit, what happens when this code executes?This exercise is intended to help you understand the relationship between
What is the register number supplied to the register file’s “Write register” input? Is this register actually written?In this exercise we examine the operation of the single-cycle datapath for
Repeat Problem 4.3.2, but the AND and OR gates you use must all be 2-input gates.Problem 4.3.2Show how this block can be implemented. Use only AND, OR, NOT, and D Flip-Flops.Problems in this exercise
Which pipeline stages can you slow down and by how much, without affecting the clock cycle time?Problems in this exercise assume that, during an execution of the program, processor cycles are spent
Assume there is full forwarding. Indicate hazards and add NOP instructions to eliminate them.In this exercise, we examine how data dependences affect execution in the basic 5-stage pipeline described
One way to move the branch resolution one stage earlier is to not need an ALU operation in conditional branches. The branch instructions would be “BEZ Rd,Label” and “BNEZ Rd,Label”, and it
When multiple logic expressions are implemented, it is possible to reduce implementation cost by using the same signals in more than one expression. Repeat 4.4.1, but implement both Control signal 1
Which control signal in Figure 4.24 is the most critical to generate quickly and how much time does the control unit have to generate it if it wants to avoid being on the critical path?In this
In a 2-issue static superscalar processor that only has one register write port, what speedup is achieved by adding a second register write port?In this exercise, we make several assumptions. First,
Assuming stall-on-branch and no delay slots, what speedup is achieved on this code if branch outcomes are determined in the ID stage, relative to the execution where branch outcomes are determined in
Repeat 4.23.1 for the 2-bit predictor.Exercise 4.23.1Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to mispredicted branches with the always-taken predictor?
What CPI would be achieved if the X86 version of this loop is executed on a processor that internally translates these instructions into MIPS-like micro-operations, then executes these
What is the accuracy of the two-bit predictor if this pattern is repeated forever?This exercise examines the accuracy of various branch predictors for the following repeating pattern (e.g., in a
What is the cycle time for the circuit you designed in 4.5.1? How long does it take to perform the 32-bit operation?Problem 4.5.1Design a circuit with 1-bit data inputs and a 1-bit data output that
Does support for this instruction introduce any new hazards? Are stalls due to existing hazards made worse?In this exercise, we examine how the ISA affects pipeline design. Problems in this exercise
How many instructions are fetched from the wrong path for each branch misprediction in a 4-issue processor?The remaining problems in this exercise assume the following pipeline depth and that the
Repeat 4.33.2, but this time every executed instruction has a RAW data dependence to the instruction that executes right after it. You can assume that no stall cycles are needed, i.e., forwarding
What does this instruction do in the EX and MEM stages?The first three problems in this exercise refer to the following MIPS instruction: a. b. SW R16,-100 (R6) OR R2, R1, RO Instruction
If the second instruction from this table is fetched right after the instruction from the first table, describe what happens in the pipeline when the first instruction causes the first exception you
What needs to be done to support undefined instruction exceptions in your datapath from 4.34.1? Note that the undefined instruction exception should be triggered whenever the processor encounters any
Repeat 4.35.2, but now assume that 10% of executed branches have all four delay slots illed with useful instruction, 20% have only three useful instructions in delay slots (the fourth delay slot is
What is the critical path for an MIPS AND instruction?Different execution units and blocks of digital logic have different latencies (time needed to do their work). In Figure 4.2 there are seven
Assuming there are no stalls, how often (percentage of all cycles) do we use the data memory?Problems in this exercise assume that instructions executed by a pipelined processor are broken down as
If an overflow exception occurs once for every 100,000 instructions executed, what is the overall speedup if we move overflow checking into the MEM stage? Assume that this change reduces EX latency
How often do you expect this instruction can be used? Do you think that we would be justified if we added this instruction to the MIPS ISA?This exercise is intended to help you better understand the
Which kinds of instructions require this resource?The remaining three problems in this exercise refer to the following logic block (resource) in the datapath: Shift-left-2 b. Registers a. Resource
What happens if the branch is taken, the instruction at “Label” is an invalid instruction, the first instruction of the exception handler is the SW instruction given above, and this store
Repeat 4.28.2, but this time use your MIPS code from 4.28.3.Exercise 4.28.3.Rearrange your code from 4.28.1 to achieve better performance on a 2-issue statically scheduled processor from Figure
For each Mux, show the values of its data output during the execution of this instruction and these register values.The remaining problems in this exercise assume that data memory is all zeros and
Given these latencies for individual elements of the datapath, compare clock cycle times of the single-cycle and the 5-stage pipelined datapath.The remaining three problems in this exercise assume
In what fraction of all cycles is the data memory used?For the remaining problems in this exercise, assume that there are no pipeline stalls and that the breakdown of executed instructions is as
For the given hazard probabilities and pipeline stage latencies, what is the speedup achieved by adding full forwarding to a pipeline that had no forwarding?The remaining three problems in this
Assuming there are no stalls or hazards, what is the utilization of the data memory?The remaining problems in this exercise assume that instructions executed by the processor are broken down as
What is the clock cycle time with and without this improvement?When processor designers consider a possible improvement to the processor datapath, the decision usually depends on the cost/performance
Repeat 4.8.1, but now the fault to test for is whether the “MemRead” control signal has this fault.Problem 4.8.1Let us assume that processor testing is done by filling the PC, registers, and data
Which value is the first one to be forwarded and what is the value it overrides?The remaining three problems in this exercise assume that, before any of the above is executed, all values in data
If there is forwarding, for the first five cycles during the execution of this code, specify which signals are asserted in each cycle by hazard detection and forwarding units in Figure 4.60.This
What is the performance impact of your changes from 4.38.3?Exercise 4.38.3If energy reduction is paramount, how would you change the pipelined design? What is the percentage reduction in the energy
Unroll this loop once and schedule it for a 2-issue static superscalar processor. Assume that the loop always executes an even number of iterations. You can use registers R10 through R20 when
What is the value of these two signals for this instruction?Different instructions require different control signals to be asserted in the datapath. The remaining problems in this exercise refer to
What is the latency of your implementation from 4.3.2?Problem 4.3.2Show how this block can be implemented. Use only AND, OR, NOT, and D Flip-Flops.Cost and latency of digital logic depends on the
It is often possible to sacrifice some speed in a circuit in order to reduce its energy consumption. Assume that we can reduce energy consumption by a factor of X (new energy is 1/X times the old
Using the first branch instruction in the given code as an example, describe the hazard detection logic needed to support branch execution in the ID stage as in Figure 4.62. Which type of hazard is
What is the total execution time of this instruction sequence without forwarding and with full forwarding? What is the speedup achieved by adding full forwarding to a pipeline that had no
What is the length of the critical path in your circuit from 4.4.3?Problem 44.3When multiple logic expressions are implemented, it is possible to reduce implementation cost by using the same signals
If you can speed up the generation of control signals, but the cost of the entire processor increases by $1 for each 5ps improvement of a single control signal, which control signals would you speed
For a 2-issue static superscalar processor with a classic 5-stage pipeline, what speedup is achieved by making the branch prediction perfect?In this exercise, we make several assumptions. First, we

Showing 1 - 100 of 1073