Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Recall from your mathematics classes that the transpose operation on a matrix exchanges its rows and columns as illustrated below (on a simple 4 x

Recall from your mathematics classes that the transpose operation on a matrix exchanges its rows and columns as illustrated below (on a simple 4 x 4 matrix):

image text in transcribed

Here is a simple C loop to show that transpose:

for(i = 0; i

for(j = 0; j

output[j,i] = input[i,j]

}

}

Assume that both the input and output matrices are stored in row major order (i.e., as a single array, from left to right, and then up to down). Assume that we are executing a 256 x 256 double prediction transpose on a processor with a 16 KB fully associative (dont worry about cache conflicts) least recently used (LRU) L1 data cache with 64-byte blocks. Assume that L1 cache misses require 16 clock cycles and always hit in the L2 cache Assume that each iteration of the inner loop requires 4 clock cycles if the data are present in the L1 cache.

a. What should be the minimum size of the cache to take advantage of blocked execution? Hint. For blocked execution to work correctly, each row of a matrix block should fit in a single cache block first you need to find out what is the largest such matrix block size; then, you have to figure out how much total memory is needed to stock a complete block for both input and output matrices.

b. Assume we use a matrix block size equal to that you calculated in part a. How many cache misses will occur using a naive non-blocked implementation of the transpose and how many cache misses will occur using properly block implementation of the transpose?

c. Write code (in the programming language of your choice, but using only basic array/list instructions) that performs a transpose with a block size parameter B that uses B x B blocks.

d. What is the minimum associativity required of the L1 cache for consistent performance independent of both arrays position in memory?

AL A22 A33 AAA. 1,1 1.2 1.3 11.4 1.2 A2.2 3.2 4.2 A1.3 A2,3 A3.3 A4 A1.4 A24 A34 A4 421 422 A23 A24 3,1 3.2 3.3 3.4 A41 A42 A43 A4.4 AL A22 A33 AAA. 1,1 1.2 1.3 11.4 1.2 A2.2 3.2 4.2 A1.3 A2,3 A3.3 A4 A1.4 A24 A34 A4 421 422 A23 A24 3,1 3.2 3.3 3.4 A41 A42 A43 A4.4

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Expert Oracle Database Architecture

Authors: Thomas Kyte, Darl Kuhn

3rd Edition

1430262990, 9781430262992

More Books

Students also viewed these Databases questions

Question

How does cluster analysis help you easily identify those outliers?

Answered: 1 week ago