For the simple implementation given above, this execution order would be nonideal for the input matrix. However,
Question:
a. What block size should be used to completely fill the data cache with one input and output block?
b. How do the relative number of misses of the blocked and unblocked versions compare if the level 1 cache is direct mapped?
c. Write code to perform a transpose with a block size parameter B that uses B × B blocks.
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Computer Architecture A Quantitative Approach
ISBN: 978-0123704900
4th edition
Authors: John L. Hennessy, David A. Patterson
Question Posted: