Assume you are redesigning a hardware prefetcher for the unblocked matrix transposition code as in Exercise 5.7.
Question:
a. In the steady state of the inner loop, what is the performance (in cycles per iteration) when using a simple two-stream sequential prefetcher assuming performance is limited by prefetching?
b. What percentage of prefetches are useful given the level 2 cache parameters?
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Computer Architecture A Quantitative Approach
ISBN: 978-0123704900
4th edition
Authors: John L. Hennessy, David A. Patterson
Question Posted: