Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1 a. Parallelism at multiple levels is now the driving force of computer design across all four classes of computers, with energy and cost being

1 a. Parallelism at multiple levels is now the driving force of computer design across all four classes of computers, with energy and cost being the primary constraints. There are basically two kinds of parallelism in applications: Data-Level Parallelism and Task- Level Parallelism Computer hardware in turn can exploit these two kinds of application parallelism in four major ways, briefly discuss the four ways.

b. To plan for the evolution of a computer, the designer must be aware of rapid changes in implementation technology. Five implementation technologies, which change at a dramatic pace, are critical to modem implementations. Explain this technologies citing the trend in growth of this technology over years.

c. Some microprocessors today are designed to have adjustable voltage, so a 15% reduction in voltage may result in a 15% reduction in frequency. What would be the impact on dynamic energy and on dynamic power?

d. Find the number of dies per 300 mm (30 cm) wafer for a die that is 1.5 cm on a side and for a die that is 1.0 cm on a side.

e. One way to improve the performance of a benchmark has been with benchmark specific flags; these flags often caused transformations that would be illegal on many programs or would slow down performance on others. To restrict this process and increase the significance of the results, benchmark developers often require the vendor to use one compiler and one set of flags for all the programs in the same language (C++ or C). In addition to the question of compiler flags, another question is whether source code modifications are allowed. There are three different approaches to addressing this question: Explain how the three approaches work.

f. Speedup tells us how much faster a task will run using the computer with the enhancement as opposed to the original computer. Amdahl's law gives a quick way to find the speedup from some enhancement, which depends on two factors. Explain the factors

g. Suppose that we want to enhance the processor used for Web serving. The new processor is 10 times faster on computation in the Web serving application than the original processor. Assuming that the original processor is busy with computation 40% of the time and is waiting for I/O 60% of the time, what is the overall speedup gained by incorporating the enhancement?

h. There are two largely separable approaches to exploiting ILP, highlighting them. (2 Marks) i. There are three different types of dependencies in ILP. In detail discuss them.

j. Data hazards may be classified as one of three types, depending on the order of read and write accesses in the instructions. By convention, the hazards are named by the ordering in the program that must be preserved by the pipeline. Consider two instructions i and j. with / preceding/in program order. Explain the possible hazards.

2) a. Caching data that is only read is easy, since the copy in the cache and memory will be identical. Caching writes is more difficult. There are two main strategies to achieve this. Discuss them

b. One measure of the benefits of different cache organizations is miss rate. Define miss rate and discuss three models that are used to sort misses

c.There are ten advanced cache optimizations which can be classified into five categories. Highlight the five metrics that are used in this classification.

d. An extended form of way prediction can also be used to reduce power consumption by using the way prediction bits to decide which cache block to actually access (the way prediction bits are essentially extra address bits); this approach, which might be called way selection, saves power when the way prediction is correct but adds significant time on a way mis-prediction, since the access, not just the tag match and selection, must be repeated. Such an optimization is likely to make sense only in low-power processors. Inoue, Ishihara, and Murakami [1999] estimated that using the way selection approach with a fourway set associative cache increases the average access time for the I-cache by 1.04 and for the D-cache by 1.13 on the SPEC95 benchmarks, but it yields an average cache power consumption relative to a normal four-way set associative cache that is 0.28 for the I-cache and 0.35 for the D-cache. One significant drawback for way selection is that it makes it difficult to pipeline the cache access. Assume that there are half as many D-cache accesses as I-cache accesses, and that the I-cache and D-cache are responsible for 25% and 15% of the processor's power consumption in a normal four-way set associative implementation. Determine if way selection improves performance per watt based on the estimates from the study.

3) a. Which is more important for floating-point programs: two-way set associatively or hit under one miss for the primary data caches? What about integer programs? Assume the following average miss rates for 32 KB data caches: 5.2% for floating-point programs with a direct-mapped cache, 4.9% for these programs with a two-way set associative cache, 3.5% for integer programs with a direct-mapped cache, and 3.2% for integer programs with a two-way set associative cache. Assume the miss penalty to 12 is 10 cycles, and the 1.2 misses and penalties are the same.

b. In general, out-of-order processors are capable of hiding much of the miss penalty of a 1.1 data cache miss that hits in the L2 cache but are not capable of hiding a significant fraction of a lower level cache miss. Deciding how many outstanding misses to support depends on different factors. State the factors.

c. Assume a main memory access time of 36 ns and a memory system capable of a sustained transfer rate of 16 GB/sec. If the block size is 64 bytes, what is the maximum number of outstanding misses we need to support assuming that we can maintain the peak bandwidth given the request stream and that accesses never conflict? If the probability of a reference colliding with one of the previous four is 50%, and we assume that the access has to wait until the earlier access completes, estimate the number of maximum outstanding references. For simplicity, ignore the time between misses.

d. An alternative to hardware perfecting is for the compiler to insert pre-fetch instructions to request data before the processor needs it. There are two flavors of pre-fetch, briefly describe them.

e. GDRAMS or GSDRAMs (Graphics or Graphics Synchronous DRAMs) are a special class of DRAMs based on SDRAM designs but tailored for handling the higher bandwidth demands of graphics processing units. Since Graphics Processor Units require more bandwidth per DRAM chip than CPUs, GDDRS have several important differences Discuss two differences

4.a. Multiprogramming, where several programs running concurrently would share a computer, led to demands for protection and sharing among programs and to the concept of a process.. At any instant, it must be possible to switch from one process to another. This exchange is called a process switch or context switch. The operating system and architecture join forces to allow processes to share the hardware yet not interfere with each other. To do this, the architecture must limit what a process can access when running a user process yet allow an operating system process to access more. Discuss four capabilities the architecture must be able to do in order to achieve the process switch.

b. Explain the benefits of virtual machines in managing software and managing hardware.

c. Generally control dependencies impose two constraints, with an example discussing them.

d. Three different effects limit the gains from loop unrolling and highlight them.

e. Calculate the number of bits in the (0,2) branch predictor with 4K entries? Show the number of entries in a (2.2) predictor with the same number of bits.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Processing Fundamentals Design And Implementation

Authors: KROENKE DAVID M.

1st Edition

8120322258, 978-8120322257

More Books

Students also viewed these Databases questions