Question
Recall from your mathematics classes that the transpose operation on a matrix exchanges its rows and columns as illustrated below (on a simple 4 x
Recall from your mathematics classes that the transpose operation on a matrix exchanges its rows and columns as illustrated below (on a simple 4 x 4 matrix):
Here is a simple C loop to show that transpose:
for(i = 0; i
for(j = 0; j
output[j,i] = input[i,j]
}
}
Assume that both the input and output matrices are stored in row major order (i.e., as a single array, from left to right, and then up to down). Assume that we are executing a 256 x 256 double prediction transpose on a processor with a 16 KB fully associative (dont worry about cache conflicts) least recently used (LRU) L1 data cache with 64-byte blocks. Assume that L1 cache misses require 16 clock cycles and always hit in the L2 cache Assume that each iteration of the inner loop requires 4 clock cycles if the data are present in the L1 cache.
a. What should be the minimum size of the cache to take advantage of blocked execution? Hint. For blocked execution to work correctly, each row of a matrix block should fit in a single cache block first you need to find out what is the largest such matrix block size; then, you have to figure out how much total memory is needed to stock a complete block for both input and output matrices.
b. Assume we use a matrix block size equal to that you calculated in part a. How many cache misses will occur using a naive non-blocked implementation of the transpose and how many cache misses will occur using properly block implementation of the transpose?
c. Write code (in the programming language of your choice, but using only basic array/list instructions) that performs a transpose with a block size parameter B that uses B x B blocks.
d. What is the minimum associativity required of the L1 cache for consistent performance independent of both arrays position in memory?
AL A22 A33 AAA. 1,1 1.2 1.3 11.4 1.2 A2.2 3.2 4.2 A1.3 A2,3 A3.3 A4 A1.4 A24 A34 A4 421 422 A23 A24 3,1 3.2 3.3 3.4 A41 A42 A43 A4.4 AL A22 A33 AAA. 1,1 1.2 1.3 11.4 1.2 A2.2 3.2 4.2 A1.3 A2,3 A3.3 A4 A1.4 A24 A34 A4 421 422 A23 A24 3,1 3.2 3.3 3.4 A41 A42 A43 A4.4Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started