In this exercise, you will explore performance trade-offs between three processors that each employ different types of
Question:
In this exercise, you will explore performance trade-offs between three processors that each employ different types of multithreading (MT). Each of these processors is superscalar, uses in-order pipelines, requires a fixed three-cycle stall following all loads and branches, and has identical L1 caches. Instructions from the same thread issued in the same cycle are read in program order and must not contain any data or control dependences.
■ Processor A is a superscalar simultaneous MT architecture, capable of issuing up to two instructions per cycle from two threads.
■ Processor B is a fine-grained MT architecture, capable of issuing up to four instructions per cycle from a single thread and switches threads on any pipeline stall.
■ Processor C is a coarse-grained MT architecture, capable of issuing up to eight instructions per cycle from a single thread and switches threads on an L1 cache miss.
Our application is a list searcher, which scans a region of memory for a specific value stored in R9 between the address range specified in R16 and R17. It is parallelized by evenly dividing the search space into four equal-sized contiguous blocks and assigning one search thread to each block (yielding four threads). Most of each thread’s runtime is spent in the following unrolled loop body:
Assume the following:
■ A barrier is used to ensure that all threads begin simultaneously.
■ The first L1 cache miss occurs after two iterations of the loop.
■ None of the BEQAL branches is taken.
■ The BLT is always taken.
■ All three processors schedule threads in a round-robin fashion.
Determine how many cycles are required for each processor to complete the first two iterations of the loop.
Step by Step Answer:
Computer Architecture A Quantitative Approach
ISBN: 9780128119051
6th Edition
Authors: John L. Hennessy, David A. Patterson