Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on May 31, 2022

Optimizing Cache Performance via Advanced Techniques Concepts illustrated by this case study Non-blocking Caches Compiler Optimizations for Caches Software and Hardware Prefetching

Optimizing Cache Performance via Advanced Techniques

Concepts illustrated by this case study

■ Non-blocking Caches

■ Compiler Optimizations for Caches

■ Software and Hardware Prefetching

■ Calculating Impact of Cache Performance on More Complex Processors

The transpose of a matrix interchanges its rows and columns; this is illustrated below:

Here is a simple C loop to show the transpose:

Assume that both the input and output matrices are stored in the row major order (row major order means that the row index changes fastest). Assume that you are executing a 256 × 256 double-precision transpose on a processor with a 16 KB fully associative (don’t worry about cache conflicts) least recently used (LRU) replacement L1 data cache with 64 byte blocks. Assume that the L1 cache misses or prefetches require 16 cycles and always hit in the L2 cache, and that the L2 cache can process a request every two processor cycles. Assume that each iteration of the inner loop above requires four cycles if the data are present in the L1 cache. Assume that the cache has a write-allocate fetch-on-write policy for write misses. Unrealistically, assume that writing back dirty cache blocks requires 0 cycles.

[10/15/15/12/20] <2.2> For the simple implementation given above, this execution order would be nonideal for the input matrix; however, applying a loop interchange optimization would create a nonideal order for the output matrix. Because loop interchange is not sufficient to improve its performance, it must be blocked instead.

a. [10] <2.2> What should be the minimum size of the cache to take advantage of blocked execution?

b. [15] <2.2> How do the relative number of misses in the blocked and unblocked versions compare in the minimum sized cache above?

c. [15] <2.2> Write code to perform a transpose with a block size parameter B which uses B × B blocks.

d. [12] <2.2> What is the minimum associativity required of the L1 cache for consistent performance independent of both arrays’ position in memory?

e. [20] <2.2> Try out blocked and nonblocked 256 × 256 matrix transpositions on a computer. How closely do the results match your expectations based on what you know about the computer’s memory system? Explain any discrepancies if possible.

A11 A12 A13 A14 A21 A22 A23 A24 A31 A32 A33 A34 A41 A42 A43 A44 A11 A21 A31 A41 A 12 A22 A32 A42 A13 A23 A33 A43 LA 14 A24 A34 A44]

Step by Step Solution

★★★★★

3.50 Rating (157 Votes )

There are 3 Steps involved in it

Step: 1

a The minimum size of the cache to take advantage of blocked execution should be 64 bytes This is because the block size of the L1 cache needs to be e... blur-text-image

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Computer Organization and Design The Hardware Software Interface

Computer Organization and Design The Hardware Software Interface

Authors: David A. Patterson, John L. Hennessy

5th edition

124077269, 978-0124077263

More Books

Students also viewed these Electrical Engineering questions

Question

★★★★★

1. How is strategic management illustrated by this case story? 2. How might SWOT analysis be helpful to Lady Gaga as she and her advisors manage her career? 3. What competitive advantage do you think...

Answered: 1 week ago

Question

★★★★★

1. How is strategic management illustrated by this case story? 2. How might SWOT analysis be helpful to Inditex executives? To Zara store managers? 3. What competitive advantage do you think Zara is...

Answered: 1 week ago

Question

★★★★★

Two rods whose lengths are l1 and l2 and heat conductivity coefficients 1 and 2 are placed end to end. Find the heat conductivity coefficient of a uniform rod of length l1 + l2 whose conductivity is...

Answered: 1 week ago

Question

★★★★★

Best Buys GeekSquad performs computer and home theater installation and service, for an upfront flat price. How can Best Buy use a job order costing system?

Answered: 1 week ago

Question

★★★★★

What is the primary goal of the financial manager with regard to inventory management? How does this goal compare to the inventory goals of production and marketing? Discuss.

Answered: 1 week ago

Question

★★★★★

8. In selecting the media for use in an advertising campaign, an organization often must make trade-offs between reach and frequency.

Answered: 1 week ago

Question

★★★★★

=+14. Product placement. The owner of a small organic food store was concerned about her sales of a specialty yogurt manufactured in Greece. As a result of increasing fuel costs, she recently had to...

Answered: 1 week ago

Question

★★★★★

What is the role of a chief security officer, and why is this organizational role a relatively new one?

Answered: 1 week ago

Question

★★★★★

Marcle purchased a catable bond (7%, unnual, 20 years maturity when totes were as. The band can be called after 10 years. What is the yield to call it there is a col premium of one year's interest 7%...

Answered: 1 week ago

Question

★★★★★

The marketing manager of Drexel-Hall is considering two alternative advertising strategies, each of which would cost $15,000 per month. One strategy is to advertise the name Drexel-Hall, which is...

Answered: 1 week ago

Question

★★★★★

Problem 1 3 - 2 4 Project Evaluation Suppose you have been hired as a financial consultant to Defense Electronics, Incorporated ( DEI ) , a large, publicly traded firm that is the market share leader...

Answered: 1 week ago

Question

★★★★★

In real estate, what are the competing interests between priority of federal vs. state tax liens? I need typed answer with explanation

Answered: 1 week ago

Question

★★★★★

Consider the following root's loci. 5 4 3 2 O Imaginary Axis (seconds) 3 -4 -5 -20 Root Locus -15 -10 -5 Real Axis (seconds) The angle of departure from the complex poles can be approximated by: A...

Answered: 1 week ago

Question

★★★★★

Use pivot table, privot chart, and basic excel functions to answer the following questions for data in "econ" tab. 1. For each of the variables in the next tab "econ": what is the average, the...

Answered: 1 week ago

Question

★★★★★

Question 18 1 pts For the gear trains sketched in the given figures, compute the output speed of the output shaft if the input shaft rotates at 1750 rpm (Accuracy .X RPM, don't include +'s or -'s...

Answered: 1 week ago

Question

★★★★★

Explain what cloud service the New York Times successfully utilized and the benefit this solution provided.

Answered: 1 week ago

Question

★★★★★

You borrow money on a self liquidating installment loan (equal payments at the end of each year, each payment is part principal part interest) Loan amount $615,000 Interest Rate 14.6% Life 56 years...

Answered: 1 week ago

Question

★★★★★

Players A, B, and C toss a fair coin in order. The first to throw a head wins. What are their respective chances of winning?

Answered: 1 week ago

Question

★★★★★

For a high-performance system such as a B-tree index for a database, the page size is determined mainly by the data size and disk performance. Assume that on average a B-tree index page is 70% full...

Answered: 1 week ago

Question

★★★★★

In this exercise we show the definition of a web server log and examine code optimizations to improve log processing speed. Th e data structure for the log is defined as follows:

Answered: 1 week ago

Question

★★★★★

Th is Exercise examines the single error correcting, double error detecting (SEC/DED) Hamming code. 1. What is the minimum number of parity bits required to protect a 128-bit word using the SEC/DED...

Answered: 1 week ago

Question

★★★★★

=+19. The following table gives sample means and standard deviations, each based on subgroups of six observations of the refractive index of fiber-optic cable: Day x s Day x s 1 95.47 1.30 13 97.02...

Answered: 1 week ago

Question

★★★★★

=+Operation by the Application of Statistical Techniques (Intl. J. Sci. Engr. Research, Vol. 3, Issue 5, May 2012) investigated the production process of a particular bath faucet manufactured in...

Answered: 1 week ago

Question

★★★★★

=+16. Hourly samples of size 3 are taken from a process that produces molded plastic containers, and a critical dimension is measured. Data from the most recent 20 samples is given here: Hour x1 x2...

Answered: 1 week ago

Previous Question Next Question