Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

132 Hierarchy Design Chapter Two Memory The transpose of a matrix interchanges its rows and columns; thi below All A12 A13 A 14 All A21

image text in transcribed

132 Hierarchy Design Chapter Two Memory The transpose of a matrix interchanges its rows and columns; thi below All A12 A13 A 14 All A21 A31 A41 A21 A22 A23 A24 A 12 A22 A32 A42 A31 A32 A33 A34 A 13 A23 A33 A43 A41 A42 A43 A44 A 14 A24 A34 A44 Here is a simple C loop to show the transpose: for 3, j++) for (j put[i][j] output j][i] Assume that both the input and output matrices are stored in the row major order (now major order means that the row index changes fastest). Assume that you are executing a 256 x 256 double-precision transpose on a processor with a 16 KB fully associative (don't worry about cache conflicts) least recently used (LRU) replacement LI data cache with 64 byte blocks. Assume that the L1 cache misses or prefetches require 16 cycles and always hit in the L2 cache, and that the L2 cache can process a request every two processor cycles. Assume that each iteration of the inner loop above requires four cycles if the data are present in the L1 cache. Assume that the cache has a write-allocate fet policy for write misses Unrealistically, assume that writing back dirty cache blocks requires 0 cycles. 2.1 U10/15/15/12/20] For the simple implementation given above, this execu- tion order would be nonideal for the input matrix; however, applying a loop inter- change optimization would create a nonideal order for the output matrix. Because loop interchange is not sufficient to improve its performance, it must be blocked instead a. [10] What should be the minimum size of the cache to take advantage of blocked execution? b. 15 2.2> How do the relative number of misses in the blocked and unblocked versions compare in the minimum sized cache above? c. [15] Write code to perform a transpose with a block size parameter B which uses B x B blocks d. [12 2.2> What is the minimum associativity required of the L1 cache for consistent performance independent of both arrays' position in memory? 132 Hierarchy Design Chapter Two Memory The transpose of a matrix interchanges its rows and columns; thi below All A12 A13 A 14 All A21 A31 A41 A21 A22 A23 A24 A 12 A22 A32 A42 A31 A32 A33 A34 A 13 A23 A33 A43 A41 A42 A43 A44 A 14 A24 A34 A44 Here is a simple C loop to show the transpose: for 3, j++) for (j put[i][j] output j][i] Assume that both the input and output matrices are stored in the row major order (now major order means that the row index changes fastest). Assume that you are executing a 256 x 256 double-precision transpose on a processor with a 16 KB fully associative (don't worry about cache conflicts) least recently used (LRU) replacement LI data cache with 64 byte blocks. Assume that the L1 cache misses or prefetches require 16 cycles and always hit in the L2 cache, and that the L2 cache can process a request every two processor cycles. Assume that each iteration of the inner loop above requires four cycles if the data are present in the L1 cache. Assume that the cache has a write-allocate fet policy for write misses Unrealistically, assume that writing back dirty cache blocks requires 0 cycles. 2.1 U10/15/15/12/20] For the simple implementation given above, this execu- tion order would be nonideal for the input matrix; however, applying a loop inter- change optimization would create a nonideal order for the output matrix. Because loop interchange is not sufficient to improve its performance, it must be blocked instead a. [10] What should be the minimum size of the cache to take advantage of blocked execution? b. 15 2.2> How do the relative number of misses in the blocked and unblocked versions compare in the minimum sized cache above? c. [15] Write code to perform a transpose with a block size parameter B which uses B x B blocks d. [12 2.2> What is the minimum associativity required of the L1 cache for consistent performance independent of both arrays' position in memory

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning ASP.NET 2.0 And Databases

Authors: John Kauffman, Bradley Millington

1st Edition

0471781347, 978-0471781349

More Books

Students also viewed these Databases questions