Questions and Answers of Computer Organization And Design

Assume that $$s0$ is initialized to 11 and $$s1$ is initialized to 22. Suppose you executed the code below on a version of the pipeline that does not handle data hazards (i.e., the programmer is
lw is the instruction with the longest latency on the CPU from Section 4.4. If we modified lw and sw so that there was no offset (i.e., the address to be loaded from/stored to must be calculated and
When do you use primitives like load linked and store conditional?1. When cooperating threads of a parallel program need to synchronize to get proper behavior for reading and writing shared data 2.
True or false: The main drawback with conventional approaches to benchmarks for parallel computers is that the rules that ensure fairness also slow software innovation.
Two options for networking are using interrupts or polling, and using DMA or using the processor via load and store instructions.1. If we want the lowest latency for small packets, which combination
True or false: For a ring with P nodes, the ratio of the total network bandwidth to the bisection bandwidth is P/2.
1. True or false: Like SMPs, message-passing computers rely on locks for synchronization.2. True or false: Clusters have separate memories and thus need many copies of the operating system.
True or False: DSAs are more effective than CPUs or GPUs in their domains primarily because you can justify using a much larger die for a domain.
True or false: GPUs rely on graphics DRAM chips to reduce memory latency and thereby increase performance on graphics applications.
True or false: Shared memory multiprocessors cannot take advantage of task-level parallelism.
1. True or false: Both multithreading and multicore rely on parallelism to get more efficiency from a chip.2. True or false: Simultaneous multithreading (SMT) uses threads to improve resource
Consider the following piece of C code:for (j = 2;jD[j] = D[j − 1]+D[j − 2];The MIPS code corresponding to the above fragment is:Instructions have the following associated latencies (in
True or false: As exemplified in the x86, multimedia extensions can be thought of as a vector architecture with short vectors that supports only contiguous vector data transfers.
True or false: Strong scaling is not bound by Amdahl’s Law.
True or false: To benefit from a multiprocessor, an application must be concurrent.
For the problems below, use data from “Cache Performance for SPEC CPU2000 Benchmarks”1. For 64 KiB data caches with varying set associativities, what are the miss rates broken down by miss types
Chip multiprocessors (CMPs) have multiple cores and their caches on a single chip. CMP on-chip L2 cache design has interesting trade-offs. The following table shows the miss rates and hit latencies
One of the biggest impediments to widespread use of virtual machines is the performance overhead incurred by running a virtual machine. Listed below are various performance parameters and application
In this exercise, we will examine how replacement policies impact miss rate. Assume a 2-way set associative cache with 4 one word blocks.Consider the following word address sequence: 0, 1, 2, 3, 4,
There are several parameters that impact the overall size of the page table. Listed below are key page table parameters.1. Given the parameters shown above, calculate the total page table size for a
This exercise examines the effect of different cache designs, specifically comparing associative caches to the direct-mapped caches from Section 5.4. For these exercises, refer to the sequence of
Examine the difficulty of adding a proposed lwi.drd, rsl, rs2 (“Load With Increment”) instruction to MIPS.Interpretation: Reg[rd] = Mem[Reg[rs1] + Reg[rs2]]1. Which new functional blocks (if any)
Cache block size (B) can affect both miss rate and miss latency. Assuming a 1-CPI machine with an average of 1.35 references (both instruction and data) per instruction, help find the optimal block
Media applications that play audio or video files are part of a class of workloads called “streaming” workloads (i.e., they bring in large amounts of data but do not reuse much of it). Consider a
Which of the following are true about RAID levels 1, 3, 4, 5, and 6?1. RAID systems rely on redundancy to achieve high availability.2. RAID 1 (mirroring) has the highest check disk overhead.3. For
Which of the following statements (if any) are generally true?1. There is no way to reduce compulsory misses.2. Fully associative caches have no conflict misses.3. In reducing misses, associativity
Which of the following is generally true about a design with multiple levels of caches?1. First-level caches are more concerned about hit time, and second-level caches are more concerned about miss
By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid
The speed of the memory system affects the designer’s decision on the size of the cache block. Which of the following cache designer guidelines are generally valid?1. The shorter the memory
Caches are important to providing a high-performance memory hierarchy to processors. Below is a list of 32-bit memory address references, given as word addresses.0x03, 0xb4, 0x2b, 0x02, 0xbf, 0x58,
Which of the following statements are generally true?1. Memory hierarchies take advantage of temporal locality.2. On a read, the value returned depends on which blocks are in the cache.3. Most of the
In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within the same row are stored contiguously. Assume each word is a
Problems in this exercise refer to the following sequence of instructions, and assume that it is executed on a five-stage pipelined datapath:add $s3, $s1, $s0lw $s2, 4($s3)lw $s1, 0($s4)or $s2, $s3,
This exercise is intended to help you understand the cost/complexity/performance trade-offs of forwarding in a pipelined processor. Problems in this exercise refer to pipelined datapaths from Figure
Consider the following loop.LOOP: ld $s0, 0($s3)ld $s1, 8($s3)add $s2, $s0, $s1 addi $s3, $s3, -16 bnez $s2, LOOPAssume that perfect branch prediction is used (no stalls due to control hazards),
Which of the two pipeline diagrams below better describes the operation of the pipeline's hazard detection unit? Why? Choice 1:ld x11, 0(x12): IF ID EX ME WBadd x13, x11, x14: IF ID EX..ME WBor x15,
If we change load/store instructions to use a register (without an offset) as the address, these instructions no longer need to use the ALU.(See Exercise 4.15.) As a result, the MEM and EX stages can
Consider the fragment of MIPS assembly below:sd $s5, 12($s3)Id $s5, 8($s3)sub $s4, $s2, $s1beqz $s4, labeladd $s2, $s0, $s1sub $s2, $s6, $s1Suppose we modify the pipeline so that it has only one
Consider a version of the pipeline from Section 4.6 that does not handle data hazards (i.e., the programmer is responsible for addressing data hazards by inserting NOP instructions where necessary).
Add NOP instructions to the code below so that it will run correctly on a pipeline that does not handle data hazards.addi $s0, $s1, 5add $s2, $s0, $s1addi $s3, $s0, 15add $s4, $s2, $s1
Assume that $s0 is initialized to 11 and $s1 is initialized to 22. Suppose you executed the code below on a version of the pipeline from Section 4.6 that does not handle data hazards (i.e., the
For which instructions (if any) is the Imm Gen block on the critical path?
Someone has asked about the possibility of data hazards occur ring through memory, as opposed to through a register. Which of the following statements about such hazards are true?1. Since memory
Examine the difficulty of adding a proposed ss rt, rs, imm (Store Sum) instruction to MIPS.Interpretation: Mem[Reg[rt]=Reg[rs]+immediate1. Which new functional blocks (if any) do we need for this
Someone has proposed moving the write for a result from an ALU instruction from the WB to the MEM stage, pointing out that this would reduce the maximum length of forwards from an ALU instruction by
Examine the difficulty of adding a proposed swap rs, rt instruction to MIPS.Interpretation: Reg[rt] = Reg[rs]; Reg[rs] = Reg[rt]1. Which new functional blocks (if any) do we need for this
Are the following statements true or false?1. The Intel Core i7 uses a multiple-issue pipeline to directly execute x86 instructions.2. Both the A53 and the Core i7 use dynamic multiple issue.3. The
When processor designers consider a possible improvement to the processor datapath, the decision usually depends on the cost/performance trade-off. In the following three problems, assume that we are
State whether the following techniques or components are associated primarily with a software- or hardware-based approach to exploiting ILP. In some cases, the answer may be both.1. Branch
Which exception should be recognized first in this sequence?1. add $1, $2, $1 # arithmetic overflow2. XXX $1, $2, $1 # undefined instruction3. sub $1, $2, $1 # hardware error
Consider the addition of a multiplier to the CPU shown in Figure 4.21. This addition will add 300 ps to the latency of the ALU, but will reduce the number of instructions by 5% (because there will no
Suppose you could build a CPU where the clock cycle time was different for each instruction. What would the speedup of this new CPU be over the CPU presented in Figure 4.21 given the instruction mix
Consider three branch prediction schemes: predict not taken, predict taken, and dynamic prediction. Assume that they all have zero penalty when they predict correctly and two cycles when they are
Problems in this exercise assume that the logic blocks used to implement a processor’s datapath have the following latencies:“Register read” is the time needed after the rising clock edge for
A group of students were debating the efficiency of the five-stage pipeline when one student pointed out that not all instructions are active in every stage of the pipeline. After deciding to ignore
Does not discuss I-type instructions like addi or andi.1. What additional logic blocks, if any, are needed to add I-type instructions to the CPU shown in Figure 4.21? Add any necessary logic blocks
Explain each of the “don’t cares” in Figure 4.18.Figure 4.18 Instruction R-format Tw SW beg RegDst 1 0 X X ALUSIC 0 1 1 0 Memto- Reg 0 1 X X Reg- Write 1 1 0 0 Mem- Mem- Read
For each code sequence below, state whether it must stall, can avoid stalls using only forwarding, or can execute without stalling or forwarding. Sequence 1 Inst0,0(3t) add $t1, $t0, sto Sequence
1. True or false: Since the jump instruction does not depend on the register values or on computing the branch target address, it can be completed during the second state, rather than waiting until
Look at the control signals in Figure 4.22. Can you combine any together? Can any control signal output in the figure be replaced by the inverse of another? If so, can you use one signal for the
When silicon chips are fabricated, defects in materials (e.g., silicon) and manufacturing errors can result in defective circuits. A very common defect is for one signal wire to get “broken” and
I. Which of the following is correct for a load instruction? Refer to Figure 4.10.a. MemtoReg should be set to cause the data from memory to be sent to the register file.b. MemtoReg should be set
Consider the following instruction mix:1. What fraction of all instructions use data memory?2. What fraction of all instructions use instruction memory?3. What fraction of all instructions use the
True or false: Because the register file is both read and written on the same clock cycle, any MIPS datapath using edge-triggered writes must have more than one copy of the register file.
How many of the five classic components of a computer—shown in Figures 4.1 and 4.2 include?Figure 4.1 Figure 4.2 PC Add Address Instruction Instruction memory Add Data Register
What is the minimum number of cycles needed to completely execute n instructions on a CPU with a k stage pipeline?Justify your formula.
Which of the advantages of an interpreter over a translator do you think was most important for the designers of Java?1. Ease of writing an interpreter2. Better error messages3. Smaller object code4.
For the following C statement, write a minimal sequence of MIPS assembly instructions that does the identical operation. Assume $t0 = A and $s0 is the base address of C.A = C[0]
Suppose the program counter (PC) is set to 0x20000000.1. What range of addresses can be reached using the MIPS jump-and-link (jai) instruction? (In other words, what is the set of possible values for
Consider a proposed new instruction named rpt. This instruction combines a loop’s condition check and counter decrement into a single instruction. For example, rpt $s0, loop would do the
Consider the following instruction:Instruction: and rd, rsl, rs2 Interpretation: Reg[rd] = Reg[rs1] AND Reg[rs2]1.What are the values of control signals generated by the control in Figure 4.10 for
The revised IEEE 754-2008 standard added a 16-bit floating-point format with five exponent bits. What do you think is the likely range of numbers it could represent? 1. 1.0000 00 × 2⁰ to 1.1111
Some programming languages allow two’s complement integer arithmetic on variables declared byte and half, whereas MIPS only has integer arithmetic operations on full words. As we recall from
Write the MIPS assembly code to implement the following C code as an atomic “set max” operation using the lI/sc instructions. Here, the argument shvar contains the address of a shared variable,
For the following code:lbu $t0, 0($t1)sw $t0, 0($t2)Assume that the register $t1 contains the address 0x10000000 and the data at address is 0x11223344.1. What value is stored in 0x10000004 on a
I. What is the range of addresses for conditional branches in MIPS (K = 1024)?1. Addresses between 0 and 64 K−12. Addresses between 0 and 256 K−13. Addresses up to about 32 K before the branch
I. Which of the following statements about characters and strings in C and Java are true?1. A string in C takes about half the memory as the same string in Java.2. Strings are just an informal name
Which of the following statements about C and Java are generally true?1. C programmers manage data explicitly, while it’s automatic in Java.2. C leads to more pointer bugs and memory leak bugs than
What MIPS instruction does this represent? Choose from one of the four options below. op rs 8 00 rt 9 rd 10 shamt funct 0 34 1. sub $t0, $t1,$t2 2. add $t2, $t0, $t1 3. sub $t2, $t1, $t0 4. sub $t2,
What is the decimal value of this 64-bit two’s complement number? 1111 1111 11111111 1111 11111111 1111 11111111 1111 11111111 11111111 1000 2)-8 3) -16, 4) 18,446,744,073,709,551.608. What is the
Given the importance of registers, what is the rate of increase in the number of registers in a chip over time?1. Very fast: They increased as fast as Moore’s law, which predicted doubling the
Write a single C statement that corresponds to the two MIPS assembly instructions below.add f, g, h add f, i, f
For the following C statement, what is the corresponding MIPS assembly code? Assume that the C variables f, g, and h, have already been placed in registers $s0, $s1, and $s2, respectively. Use a
For a given function, which programming language likely takes the most lines of code? Put the three representations below in order.1. Java 2. C 3. MIPS assembly language
Consider the table given next, which tracks several performance indicators for Intel desktop processors since 2010.The “Tech” column shows the minimum feature size of each processor’s
Consider the following performance measurements for a program: Measurement Instruction count Clock rate CPI Computer A 10 billion 4 GHz 1.0 Computer B 8 billion 4 GHz 1.1 a. Which computer has the
Suppose we know that an application that uses both personal mobile devices and the Cloud is limited by network performance. For the following changes, state whether only the throughput improves, both
A key factor in determining the cost of an integrated circuit is volume. Which of the following are reasons why a chip made in high volume should cost less?1. With high volumes, the manufacturing
The seven great ideas in computer architecture are similar to ideas from other fields. Match the seven ideas from computer architecture, “Use Abstraction to Simplify Design”, “Make the Common
I. C has many statements for decisions and loops, while MIPS has few. Which of the following do or do not explain this imbalance? Why?1. More decision statements make code easier to read and
Which operations can isolate a field in a word?1. AND 2. A shift left followed by a shift right
A given application written in Java runs 15 seconds on a desktop processor. A new Java compiler is released that requires only 0.6 as many instructions as the old compiler. Unfortunately, it
Semiconductor DRAM memory, flash memory, and disk storage differ significantly. For each technology, list its volatility, approximate relative access time, and approximate relative cost compared to
List and describe three types of computers.