Matrix multiplication is a key operation supported in hardware by the TPU. Before going into details of

Question:

Matrix multiplication is a key operation supported in hardware by the TPU. Before going into details of the TPU hardware, it’s worth analyzing the matrix multiplication calculation itself. One common way to depict matrix multiplication is with the following triply nested loop:

image text in transcribed

a. Suppose that M, N, and K are all equal. What is the asymptotic complexity in time of this algorithm? What is the asymptotic complexity in space of the arguments? What does this mean for the operational intensity of matrix multiplication as M, N, and K grow large?

b. Suppose that M=3, N=4, and K=5, so that each of the dimensions are relatively prime. Write out the order of accesses to memory locations in each of the three matrices A, B, and C (you might start with two-dimensional indices, then translate those to memory addresses or offsets from the start of each matrix). For which matrices are the elements accessed sequentially? Which are not? Assume row-major (C-language) memory ordering.

c. Suppose that you transpose matrix B, swapping its indices so that they are B[N][K] instead. So, now the innermost statement of the loop looks like:

image text in transcribed

Now, for which matrices are the elements accessed sequentially?

d. The innermost (k-indexed) loop of our original routine performs a dot-product operation. Suppose that you are a given a hardware unit that can perform an 8-element dot-product more efficiently than the raw C code, behaving effectively like this C function:

image text in transcribed

How would you rewrite the routine with the transposed B matrix from part (c) to use this function?

e.  Suppose that instead, you are given a hardware unit that performs an 8-element “saxpy” operation, which behaves like this C function:

image text in transcribed

Write another routine that uses the saxpy primitive to deliver equivalent results to the original loop, without the transposed memory ordering for the B matrix.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Computer Architecture A Quantitative Approach

ISBN: 9780128119051

6th Edition

Authors: John L. Hennessy, David A. Patterson

Question Posted: