Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1. [5] Consider the addition of a multiplier to the CPU shown in Figure 4.23 (page 282). This addition will add 300 ps to the

1. [5] Consider the addition of a multiplier to the CPU shown in Figure 4.23 (page 282). This addition will add 300 ps to the latency of the ALU, but will reduce the number of instructions by 5% (because there will no longer be a need to emulate the multiply instruction).

What is the time and this improvement?

Without Multiplier : The maximum clock cycle time is taken by the instructionlw Clock cycle = I-Mem + Regs + Mux + ALU + D-Mem + Mux + Regs = 400 + 200 + 30 + 120 + 350 + 30 = 1130 ps.

i.e., Without Modifications: Cycle Time = 400 + 200 + 30 + 120 + 350 + 30 Cycle Time = 1130 ps

With Multiplier : The maximum clock cycle time is taken by the instructionlw Clock cycle = I-Mem + Regs + Mux + ALU + Mul + D-Mem + Mux + Regs = 400 + 200 + 30 + 120 + 300 + 350 + 30 = 1430 ps

i.e., With Modifications: Cycle Time = 1130 + 300 Cycle Time = 1430 ps

2. [5] Consider the addition of a multiplier to the CPU shown in Figure 4.23 (page 282). This addition will add 300 ps to the latency of the ALU, but will reduce the number of instructions by 5% (because there will no longer be a need to emulate the multiply instruction).

What is the achieved by adding this improvement?

Speed up performance by addition of this improvement:

Speed up = new clock cycle time/old clock cycle time

= (1130 * 100)/(95 * 1430)

0.83

3.[5] Consider the addition of a multiplier to the CPU shown in Figure 4.23 (page 282). This addition will add 300 ps to the latency of the ALU, but will reduce the number of instructions by 5% (because there will no longer be a need to emulate the multiply instruction).

What is the the new ALU can be and still result in improved performance?

With multiplier = (1000 + 10 + 10 + 200 + 10 + 100 + 300 + 30 + 2000 + 600 + 30)/1430

= 3

Without multiplier = (1000 + 200 + 10 + 2000 + 100 + 30 + 10 + 10 + 500 + 30)/1130

3.44

Difference of cost (per unit) = without multiplier - with multiplier

= 3.44 - 3.14

= 0.3

Ratio of performance = cost of improvement/cost of without improvement

= 3.44/3.14

1.10

Performance = Ratio/Speed up

= 1.10/0.83

1.32

The slowest the new ALU can be is 1.32ps to still result in improved performance

4. [5] LDUR is instruction with the longest latency on the CPU from Section 4.4 (page 271-283). If we modified LDUR and STUR so that there was no offset (i.e., the address to be loaded from/stored to must be calculated and placed in Rd before calling LDUR/ STUR), then no instruction would use both the ALU and Data memory. This would allow us to reduce the clock cycle time. However, it would also increase the number of instructions, because many LDUR and STUR instructions would need to be replaced with LDUR/ADD or STUR/ADD combinations.

What would the new clock cycle time be?

5. [5] LDUR is instruction with the longest latency on the CPU from Section 4.4 (page 271-283). If we modified LDUR and STUR so that there was no offset (i.e., the address to be loaded from/stored to must be calculated and placed in Rd before calling LDUR/ STUR), then no instruction would use both the ALU and Data memory. This would allow us to reduce the clock cycle time. However, it would also increase the number of instructions, because many LDUR and STUR instructions would need to be replaced with LDUR/ADD or STUR/ADD combinations.

What is the primary factor that influences whether a program will run faster or slower on the new CPU? image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed

4.4 A Simple Implementation Scheme 271 Now that we have completed this simple datapath, we can add the control unit. The control unit must be able to take inputs and generate a write signal for each state element, the selector control for each multiplexor, and the ALU control. The ALU control is different in a number of ways, and it will be useful to design it first before we design the rest of the control unit Elaboration: The sign extension logic must choose between sign extending a 9 bit field in instruction bits 20:12 for data transfer instructions or a 19-bit field (bits 23:5) for the conditional branch. Since the input is all 32 bits of the instruction, it can use the opcode bits of the instruction to select the proper field. LEGv8 opcode bit 26 happens to be O for data transfer instructions and 1 for conditional branch. Thus, bit 26 can control a 2:1 multiplexor inside the sign extension logic that selects the 9-bit fied if it is 0 or the 19-bit field if it is 1 4.4 A Simple Implementation Scheme In this section, we look at what might be thought of as a simple implementation of our LEGv8 subset. We build this simple implementation using the datapath of the last section and adding a simple control function. This simple implementation covers load register (LDUR), store register (STUR), compare and branch zero (CBZ), and the arithmetic-logical instructions ADD, SUB, AND, and ORR. We will later enhance the design to include an unconditional branch instruction (8) The ALU Control The LEGv8 ALU in Appendix A defines the six following combinations of four control inputs: ALU control lines Function AND OR 0001 0010 0110 0111 subtract pass input b NOR Depending on the instruction class, the ALU will need to perform one of these first five functions. (NOR can be used for other parts of the LEGv8 instruction set not found in the subset we are implementing.) For load register and store register instructions, we use the ALU to compute the memory address by addition. For the R-type instructions, the ALU needs to perform one of the four actions (AND OR, subtract, or add), depending on the value of the 11-bit opcode field in the 4.4 A Simple Implementation Scheme 271 Now that we have completed this simple datapath, we can add the control unit. The control unit must be able to take inputs and generate a write signal for each state element, the selector control for each multiplexor, and the ALU control. The ALU control is different in a number of ways, and it will be useful to design it first before we design the rest of the control unit Elaboration: The sign extension logic must choose between sign extending a 9 bit field in instruction bits 20:12 for data transfer instructions or a 19-bit field (bits 23:5) for the conditional branch. Since the input is all 32 bits of the instruction, it can use the opcode bits of the instruction to select the proper field. LEGv8 opcode bit 26 happens to be O for data transfer instructions and 1 for conditional branch. Thus, bit 26 can control a 2:1 multiplexor inside the sign extension logic that selects the 9-bit fied if it is 0 or the 19-bit field if it is 1 4.4 A Simple Implementation Scheme In this section, we look at what might be thought of as a simple implementation of our LEGv8 subset. We build this simple implementation using the datapath of the last section and adding a simple control function. This simple implementation covers load register (LDUR), store register (STUR), compare and branch zero (CBZ), and the arithmetic-logical instructions ADD, SUB, AND, and ORR. We will later enhance the design to include an unconditional branch instruction (8) The ALU Control The LEGv8 ALU in Appendix A defines the six following combinations of four control inputs: ALU control lines Function AND OR 0001 0010 0110 0111 subtract pass input b NOR Depending on the instruction class, the ALU will need to perform one of these first five functions. (NOR can be used for other parts of the LEGv8 instruction set not found in the subset we are implementing.) For load register and store register instructions, we use the ALU to compute the memory address by addition. For the R-type instructions, the ALU needs to perform one of the four actions (AND OR, subtract, or add), depending on the value of the 11-bit opcode field in the

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Multidimensional Array Data Management In Databases

Authors: Florin Rusu

1st Edition

1638281483, 978-1638281481

More Books

Students also viewed these Databases questions

Question

Question 15 of 18

Answered: 1 week ago

Question

a. Describe the encounter. What made it intercultural?

Answered: 1 week ago