Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Architicture class - Tomasulo algorithm: Here is table 3.4 And this is p.98 A. Fill in the cycles an instruction spends in each stage (per

Architicture class - Tomasulo algorithm:

image text in transcribed

Here is table 3.4

image text in transcribed

And this is p.98

image text in transcribed

A. Fill in the cycles an instruction spends in each stage (per Table 3.4 and details on p. 98), showing the start and end cycles in each column. Initially, all registers are logical and the ROB is empty. Show the assigned reservation station, ROB entry, reasons for delays, if any, in the "Explanations" column. Instruction flow and resources involved in an out-of-order processor Tomasulos algorithm quite closely. Other mechanisms are described in Chapter 4. We set the map of the result register to the tag (the name of the tail entry of the ROB) and enter the tag in the reservation station. Note that this step must be done after the source operands have been looked at, because a register can be both a source and a result. c. We enter the instruction at the tail of the ROB with the result flag off. Issue. When both flags in a reservation station are set to ready (are on), the instruction can be issued to the functional unit to start execution unless all stages of the latter are stalling because the unit is waiting for the CDB to broad-cast its result. If more than one reservation station in the same set is ready to issue in the same cycle, some scheduling algorithm will resolve the contention. Execute. Execution proceeds normally until the last execution cycle. At that point, the unit asks for ownership of the common data bus (CDB). If there is more than one unit asking for the CDB in the same cycle, some (hardwired) priority scheme will resolve the contention. Once the unit has ownership of the CDB: a. It broadcasts the result of its operation and the tag associated with the instruction. b. The result is stored in the ROB entry identified by the tag, and the flag indicating a result in the ROB entry is set. c. The result is stored as well in all reservation stations that have the tag as an operand. In such reservation stations, the result replaces the tag, and the corresponding flag (ready bit) is set. Commit. At each cycle, the bit indicating a result in the entry at the head of the ROB is checked. If it is on, the result is stored in the logical register indicated by the ROB entry, and the entry is deleted. To illustrate Tomasulos algorithm (cf. Table 3.5), we consider the same example as for the CDC 6600 scoreboard, namely, the following sequence of instructions: i1: R4 leftarrow R0 * R2 # will use reservation station 1 of multiplier i2: R6 leftarrow R4 * R8 # will use reservation station 2 of multiplier;RAW with i1 i3: R8 leftarrow R2 + R12 # will use reservation station 1 of adder;WAR with i2 i4: R4 leftarrow R14 + R16 # will use reservation station 2 of adder;WAW with i1 The adder and multiplier are pipelined with latencies of 1 and 4 cycles, respectively. If there were no dependencies, a multiplication (respectively, addition) being decoded at time t_0 would be dispatched at time t_0 + 1, be issued at time t_0 + 2, start executing at time t_0 + 2, finish executing at time t_0 + 5 (respectively, at time t_0 + 2), broadcast at time t_0 + 6 (respectively, at time t_0 + 3), and commit if it were at the head of the ROB at time t_0 + 7 (respectively, at time t_0 + 4). A. Fill in the cycles an instruction spends in each stage (per Table 3.4 and details on p. 98), showing the start and end cycles in each column. Initially, all registers are logical and the ROB is empty. Show the assigned reservation station, ROB entry, reasons for delays, if any, in the "Explanations" column. Instruction flow and resources involved in an out-of-order processor Tomasulos algorithm quite closely. Other mechanisms are described in Chapter 4. We set the map of the result register to the tag (the name of the tail entry of the ROB) and enter the tag in the reservation station. Note that this step must be done after the source operands have been looked at, because a register can be both a source and a result. c. We enter the instruction at the tail of the ROB with the result flag off. Issue. When both flags in a reservation station are set to ready (are on), the instruction can be issued to the functional unit to start execution unless all stages of the latter are stalling because the unit is waiting for the CDB to broad-cast its result. If more than one reservation station in the same set is ready to issue in the same cycle, some scheduling algorithm will resolve the contention. Execute. Execution proceeds normally until the last execution cycle. At that point, the unit asks for ownership of the common data bus (CDB). If there is more than one unit asking for the CDB in the same cycle, some (hardwired) priority scheme will resolve the contention. Once the unit has ownership of the CDB: a. It broadcasts the result of its operation and the tag associated with the instruction. b. The result is stored in the ROB entry identified by the tag, and the flag indicating a result in the ROB entry is set. c. The result is stored as well in all reservation stations that have the tag as an operand. In such reservation stations, the result replaces the tag, and the corresponding flag (ready bit) is set. Commit. At each cycle, the bit indicating a result in the entry at the head of the ROB is checked. If it is on, the result is stored in the logical register indicated by the ROB entry, and the entry is deleted. To illustrate Tomasulos algorithm (cf. Table 3.5), we consider the same example as for the CDC 6600 scoreboard, namely, the following sequence of instructions: i1: R4 leftarrow R0 * R2 # will use reservation station 1 of multiplier i2: R6 leftarrow R4 * R8 # will use reservation station 2 of multiplier;RAW with i1 i3: R8 leftarrow R2 + R12 # will use reservation station 1 of adder;WAR with i2 i4: R4 leftarrow R14 + R16 # will use reservation station 2 of adder;WAW with i1 The adder and multiplier are pipelined with latencies of 1 and 4 cycles, respectively. If there were no dependencies, a multiplication (respectively, addition) being decoded at time t_0 would be dispatched at time t_0 + 1, be issued at time t_0 + 2, start executing at time t_0 + 2, finish executing at time t_0 + 5 (respectively, at time t_0 + 2), broadcast at time t_0 + 6 (respectively, at time t_0 + 3), and commit if it were at the head of the ROB at time t_0 + 7 (respectively, at time t_0 + 4)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

Evaluate three pros and three cons of e-prescribing

Answered: 1 week ago

Question

10-9 How have social technologies changed e-commerce?

Answered: 1 week ago