Question: Chapter1: 1. In Example 1 of Section 1.2.1, we assumed that the cache miss penalty was 20 cycles. With modern processors running at a frequency
Chapter1:
1. In Example 1 of Section 1.2.1, we assumed that the cache miss penalty was 20 cycles. With modern processors running at a frequency of 1 to 3 GHz, the cache miss penalty can reach several hundred cycles (we will see how this can be somewhat mitigated by a cache hierarchy). Keeping all other parameters, the same as in the example, plot CPI vs. cache miss penalty cost when the latter varies between 20 and 500 cycles (choose appropriate intervals). Do your computations argue for the threat of a memory wall whereby loading instructions and data could potentially dominate the execution time?
Chapter2 Assume that you want to use only one ALU for the basic five-stage pipeline of Section
2.1.2. A problem arises when you implement branches in that the ALU needs to be used twice: once for branch target computation and once for making the register comparison. Design the data path and control unit with this constraint. What are the performance implications?
Assume that it takes 1 cycle to access and return the information on a data-TLB hit and 1 cycle to access and return the information on a data-cache hit. A data-TLB miss takes 300 cycles to resolve, and a data-cache miss takes 100 cycles to resolve. The data-TLB hit rate is 0.99, and the data-cache hit rate is 0.95. What is the average memory access time for a load data reference? Now the data cache is an L1 D-cache and is backed up by an L2 unified cache. Every data reference that misses in L1 has a 60% chance of hitting in L2. A miss in L1 followed by a hit in L2 has a latency of 10 cycles. What is the average memory access time for a load data reference in this new configuration?
As a designer you are asked to evaluate three possible options for an on-chip write-through data cache. Some of the design options (associativity) and performance consequences (miss rate, miss penalty) are described in the table below:
Data cache options Miss rate Miss penalty Cache A: Direct mapped 0.08 4 cycles Cache B: Two-way set-associative 0.04 6 cycles Cache C: Four-way set-associative 0.02 8 cycles
(a) Assume that load instructions have a CPI of 1.2 if they hit in the cache and a CPI of 1.2 + (miss penalty) otherwise. All other instructions have a CPI of 1.2. The instruction mix is such that 20% of instructions are loads. What is the CPI for each configuration (youll need to keep three decimal digits, i.e., compute the CPI as x.xxx)? Which cache would you choose if CPI is the determining factor?
(b) Assume now that if the direct-mapped cache is used, the cycle time is 20 ns. If the two-way set-associative cache is used, the cycle time is 22 ns. If the four-way is used, the cycle time is 24 ns. What is the average time per instruction? Which cache would you choose if average time per instruction is the determining factor?
(c) In the case of the two-way set-associative cache, the replacement algorithm is LRU. In the case of the four-way set-associative cache, the replacement algorithm is such that the most recently used (MRU) line is not replaced; the choice of which of the other three is replaced is random and not part of the logic associated with each line in the cache. Indicate what bits are needed to implement the replacement algorithms for each line.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
