Question

1 Approved Answer

Posted on Sep 24, 2024

12 pes. NERETIN Murupuchun arme rerlormance R10 running sum R12 Aalement value R3 Aalement address R4AEOW base (fixed in inner loop) R5 A column offset

image text in transcribed

12 pes. NERETIN Murupuchun arme rerlormance R10 running sum R12 Aalement value R3 Aalement address R4AEOW base (fixed in inner loop) R5 A column offset (variable in inner loop) R16 Belement value R? Balement address RB BEOW base variable in inner loop) R9 3 column offset (fixed in inner loop) RD = 0; A base, B_base, C_base = base addresses of three matrices Outer 2: Outeri: addi addi addi addi sub addi addi addi addi Inner R4, RO, 400 R4, R4,-40 R9, RO, 40 R9, R9, -4 R10, R10, R10 R5, RO, 40 R9, RO, 400 R5, 75, -4 Re, e, -40 R3, R4, R5 R2, R., 79 R 12, A base (3) R 16, B base (R7) R12, R12, R16 R10, 10, R12 R5, Inner R3 R4, R9 C_base (R3), R10 R9, Outeri R4, Outer2 set to last column, R4 RD + 400 decrement A's ro base set to last element of TOM decrement B's column offset clear sum set to last element of row 1 set to last column ; decrement As column offset . decement B'S TOM base form A's element address, R = 4.RS form B's element address load A's element into R12, R3 is offset load B's element into R16, R7 is offset compute product sum products l oop across all elements compute result address store result (10) to c matrix, 3 offset loop across all a's columns loop across all ASTOWS add L Iw mul add bnex add bnez bnez Consider the 10 by 10 matrix multiplication algorithm used in the MIPS code above. Two 10 by 10 matrices A and B are multiplied leaving the result in C. The system running the application employs a data cache which is initially empty. The data cache has the following properties: cache organization the number of sets .......... The number of lines/set ... 1024 The number of words/line ....4 T cache .. 20 ns Thain 100 ns per line write update policy ........ copy-back write allocation policy ..... insert in cache replacement policy .......... LRU (a) For this application, what is the expected number of misses of each type? compulsory capacity: conflict (b) What is the hit rate of the cache? (show work) (c) What is the average data access time (T_effective) in ns? (show work) (d) If the cycle time of a pipelined system is 20 ns, how many stalls are introduced for each miss? (show work) 12 pes. NERETIN Murupuchun arme rerlormance R10 running sum R12 Aalement value R3 Aalement address R4AEOW base (fixed in inner loop) R5 A column offset (variable in inner loop) R16 Belement value R? Balement address RB BEOW base variable in inner loop) R9 3 column offset (fixed in inner loop) RD = 0; A base, B_base, C_base = base addresses of three matrices Outer 2: Outeri: addi addi addi addi sub addi addi addi addi Inner R4, RO, 400 R4, R4,-40 R9, RO, 40 R9, R9, -4 R10, R10, R10 R5, RO, 40 R9, RO, 400 R5, 75, -4 Re, e, -40 R3, R4, R5 R2, R., 79 R 12, A base (3) R 16, B base (R7) R12, R12, R16 R10, 10, R12 R5, Inner R3 R4, R9 C_base (R3), R10 R9, Outeri R4, Outer2 set to last column, R4 RD + 400 decrement A's ro base set to last element of TOM decrement B's column offset clear sum set to last element of row 1 set to last column ; decrement As column offset . decement B'S TOM base form A's element address, R = 4.RS form B's element address load A's element into R12, R3 is offset load B's element into R16, R7 is offset compute product sum products l oop across all elements compute result address store result (10) to c matrix, 3 offset loop across all a's columns loop across all ASTOWS add L Iw mul add bnex add bnez bnez Consider the 10 by 10 matrix multiplication algorithm used in the MIPS code above. Two 10 by 10 matrices A and B are multiplied leaving the result in C. The system running the application employs a data cache which is initially empty. The data cache has the following properties: cache organization the number of sets .......... The number of lines/set ... 1024 The number of words/line ....4 T cache .. 20 ns Thain 100 ns per line write update policy ........ copy-back write allocation policy ..... insert in cache replacement policy .......... LRU (a) For this application, what is the expected number of misses of each type? compulsory capacity: conflict (b) What is the hit rate of the cache? (show work) (c) What is the average data access time (T_effective) in ns? (show work) (d) If the cycle time of a pipelined system is 20 ns, how many stalls are introduced for each miss? (show work)