This part of our case study will focus on the amount of instruction-level parallelism available to the
Question:
For the purposes of this case study, assume that each line of code in Figure 3.14 takes one execution cycle (its dependence height is 1) and, for the purposes of computing ILP, takes one instruction. These (unrealistic) assumptions are made to greatly simplify bookkeeping in solving the following exercises. And while statements execute on each iteration of their respective loops, to test if the loop should continue. In this ideal case, most of the dependences in the code sequence are relaxed and a high degree of ILP is therefore readily available. We will later examine a general case, in which the realistic dependences in the code segment reduce the amount of parallelism available.
Further suppose that the code is executed on an "ideal" processor with infinite issue width, unlimited renaming, "omniscient" knowledge of memory access disambiguation, branch prediction, and so on, so that the execution of instructions is limited only by data dependence. Consider the following in this context:
a. Describe the data (true, anti, and output) and control dependences that govern the parallelism of this code segment, as seen by a run time hardware scheduler. Indicate only the actual dependences (i.e., ignore dependences between stores and loads that access different addresses, even if a compiler or processor would not realistically determine this). Draw the dynamic dependence graph for six consecutive iterations of the outer loop (for insertion of six elements), under the ideal case. In this dynamic dependence graph, we are identifying data dependences between dynamic instances of instructions: each static instruction in the original program has multiple dynamic instances due to loop execution. The following definitions may help you find the dependences related to each instruction:
• Data true dependence: On the results of which previous instructions does each instruction immediately depend?
• Data anti dependence: Which instructions subsequently write locations read by the instruction?
• Data output dependence: Which instructions subsequently write locations written by the instruction?
• Control dependence: On what previous decisions does the execution of a particular instruction depend (in what case will it be reached)?
b. Assuming the ideal case just described, and using the dynamic dependence graph you just constructed, how many instructions are executed, and in how many cycles?
c. What is the average level of ILP available during the execution of the for loop?
d. In part (c) we considered the maximum parallelism achievable by a run-time hardware scheduler using the code as written. How could a compiler increase the available parallelism, assuming that the compiler knows that it is dealing with the ideal case. Think about what is the primary constraint that prevents executing more iterations at once in the ideal case. How can the loop be restructured to relax that constraint?
e. For simplicity, assume that only variables i, hash_index, ptrCurr, and ptrUpdate need to occupy registers. Assuming general renaming, how many registers are necessary to achieve the maximum achievable parallelism in part (b)?
f. Assume that in your answer to part (a) there are 7 instructions in each iteration. Now, assuming a consistent steady-state schedule of the instructions in the example and an issue rate of 3 instructions per cycle, how is execution time affected?
g. Finally, calculate the minimal instruction window size needed to achieve the maximal level of parallelism?
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Computer Architecture A Quantitative Approach
ISBN: 978-0123704900
4th edition
Authors: John L. Hennessy, David A. Patterson
Question Posted: