Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Y=2 Z=2 Question 4 (a) Superscalar processors implement instruction parallelism by using pipelines. The processor shown in Figure Q4(a) can issue two instructions per cycle
Y=2
Z=2
Question 4 (a) Superscalar processors implement instruction parallelism by using pipelines. The processor shown in Figure Q4(a) can issue two instructions per cycle if there is no resource conflict and no data dependence problem. The processor has two pipelines, with four processing stages (fetch, decode, execute, and store). Each pipeline has its own fetch decode and store unit, however the four functional units (multiplier, adder, logic unit, and load unit) that are available in the execute stage are shared by the two pipelines on a dynamic basis as single units. The two store units can be dynamically used by the two pipelines, depending on availability at a particular cycle. There is also a lookahead window with its own fetch and decoding logic that can be used for instruction lookahead for out-of-order instruction issue. Consider the following program to be executed on this processor I1: Load RI, A ;RI --Memory (A) 12: Add R2, RI R2 -- (R2)+R(1) 13: Mul R3, R4 ; R3 (R3) R(4) 14: Add R4, RS R4-(R4) +R(5) 15: Comp R6 : R6 (R6) 16: Mul R6, R7 R6 -- (R6) * R(7) (0) Categorize any dependencies that exist in the program into true data dependency (RAW), anti-dependency (WAR), or output dependency (WAW). (10 marks) m Using a diagram, illustrate the pipeline activity if the processor is implemented using out-of-order issue with out-of-order completion policies. Assume that dependencies are not violated. (6 marks) Fetch Decode stage stage Store (write back) Execute stage Multiplier mlm2 m3 Adder ala2 di 1 d2 Logic el Load | Lakthaal wisdom Figure 04(a) (b) A benchmark program is executed on a 9-computer cluster. The benchmark program takes a total of time to run on this cluster. It was found that during 28.2% of time T, the benchmark program was running simultaneously on all nine computers, whereas in the remaining time the benchmark program was running only on a single computer. (1) ( Evaluate the improvement in performance by determining the effective speedup of the 9-computer cluster as compared a single computer. (4 marks) 11 () Parallel processing is only effective if the code of an application is optimized for it. Conclude if the parallelization of the code in the benchmark program is optimum by determining the percentage of code that has been parallelized. (5 marks) (Total: 25 marks)Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started