In this problem, we will be examining the performance of the instruction cache on the MIPS assembly program shown on the next page The first column shows the instruction address for each instruction Note that these addresses are byte addresses The value of r1 is initially 64, meaning that there are 64 iterations in the loop In this problem, we will be considering the execution of this loop with a direct mapped instruction cache microarchitecture with eight cache lines, and each cacheline is 16B This means each cache line can hold four instructions and the bottom four bits of an instruction address are the block offset Hint The first instruction in the code segment (i e , addiu r1, r1, 1), is in the middle of a cache line with starting address 0x100 For this problem, the instruction cache hit time is two cycles, but it is fully pipelined Tag check occurs in the first cycle, and if it is a hit, the instruction is read in the second cycle Essentially, this creates a six stage pipelined processor with the following stages instruction cache tag check (F0), instruction cache data access (F1), decode (D), execute (X), memory (M), and write back (W) This also implies the data cache hit time is one cycle Assume that jumps are resolved in the decode stage and that branches are re solved in the execute stage Assume the miss penalty is three cycles so on a cache miss the pipeline will stay in F0 for a total of three cycles, go into F1 for one cycle, and then continue as normal You should assume that in every other way, the processor pipeline follows the classic fully bypassed five stage pipeline Assume that the processor does not include a branch delay slot Assume the processor speculatively predicts all jumps and branches are not taken 1 Draw a pipeline diagram illustrating the first iteration of the loop assuming there are no instruction cache misses Remember that there are two fetch stages (F0 and F1) Show stalls by simply repeating the pipeline stage character (e g , D) for multiple consecutive cycles Use a dash ( ) to indicate pipeline bubbles caused by killing instructions (pipeline flushes) You should show all instructions in the first iteration of the loop and the first instruction of the second iteration that you can properly draw the control dependency for the backwards branch 2 First Iteration of the Loop Fill in the table above In the appropriate column, write compulsory, conflict, or capacity next to each instruction which misses in the instruction cache to indicate the type of instruction cache misses that occur in the first iteration of the loop Assume that the instruction cache is initially completely empty Now draw a pipeline diagram illustrating the first iteration of the loop including instruction cache misses Clearly indicate the number of cycles it takes to execute the first iteration 3 Second Iteration of the Loop and overall CPI Continue to fill in the table above Write compulsory, conflict, or capacity next to each instruction which misses in the instruction cache to indicate the type of instruction caches misses that occur in the second iteration of the loop Now draw a pipeline diagram illustrating the second iteration of the loop Clearly indicate the number of cycles it takes to execute the second iteration Calculate the CPI for this processor executing all 64 iteration of the loop (Note The CPI calculation should not include instructions that are fetched but then later squashed)

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

In this problem, we will be examining the performance of the instruction cache on the MIPS assembly program shown on the next page. The first

In this problem, we will be examining the performance of the instruction cache on the MIPS

assembly program shown on the next page. The first column shows the instruction address for

each instruction. Note that these addresses are byte addresses. The value of r1 is initially 64,

meaning that there are 64 iterations in the loop. In this problem, we will be considering the

execution of this loop with a direct-mapped instruction cache microarchitecture with eight

cache lines, and each cacheline is 16B. This means each cache line can hold four instructions

and the bottom four bits of an instruction address are the block offset. Hint: The first

instruction in the code segment (i.e., addiu r1, r1, -1), is in the middle of a cache line

with starting address 0x100.

For this problem, the instruction cache hit time is two cycles, but it is fully pipelined. Tag

check occurs in the first cycle, and if it is a hit, the instruction is read in the second cycle.

Essentially, this creates a six-stage pipelined processor with the following stages: instruction

cache tag check (F0), instruction cache data access (F1), decode (D), execute (X), memory

(M), and write-back (W). This also implies the data cache hit time is one cycle.

Assume that jumps are resolved in the decode stage and that branches are re- solved in the execute stage. Assume the miss penalty is three cycles so on a

cache miss the pipeline will stay in F0 for a total of three cycles, go into F1 for

one cycle, and then continue as normal. You should assume that in every other

way, the processor pipeline follows the classic fully-bypassed five-stage pipeline.

Assume that the processor does not include a branch delay slot. Assume the

processor speculatively predicts all jumps and branches are not taken.

Draw a pipeline diagram illustrating the first iteration of the loop assuming there are no

instruction cache misses. Remember that there are two fetch stages (F0 and F1). Show stalls

by simply repeating the pipeline stage character (e.g., D) for multiple consecutive cycles.

Use a dash (-) to indicate pipeline bubbles caused by killing instructions (pipeline flushes).

You should show all instructions in the first iteration of the loop and the first instruction of

the second iteration that you can properly draw the control dependency for the backwards

branch.

image text in transcribed

2. First Iteration of the Loop

Fill in the table above. In the appropriate column, write compulsory, conflict, or capacity next

to each instruction which misses in the instruction cache to indicate the type of instruction

cache misses that occur in the first iteration of the loop. Assume that the instruction

cache is initially completely empty. Now draw a pipeline diagram illustrating the first

iteration of the loop including instruction cache misses. Clearly indicate the number of cycles

it takes to execute the first iteration.

3. Second Iteration of the Loop and overall CPI

Continue to fill in the table above. Write compulsory, conflict, or capacity next to each

instruction which misses in the instruction cache to indicate the type of instruction caches

misses that occur in the second iteration of the loop. Now draw a pipeline diagram illustrating

the second iteration of the loop. Clearly indicate the number of cycles it takes to execute the

second iteration. Calculate the CPI for this processor executing all 64 iteration of the loop

(Note: The CPI calculation should not include instructions that are fetched but

then later squashed).

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Processing

Authors: David J. Auer David M. Kroenke

13th Edition

★★★★★

What are OLAP Cubes?

Answered: 1 week ago

Previous Question Next Question