Question: We saw that our measurements of the prefix-sum function psum1 (Figure 5.1) yield a CPE of 9.00 on a machine where the basic operation to

We saw that our measurements of the prefix-sum function psum1 (Figure 5.1) yield a CPE of 9.00 on a machine where the basic operation to be performed, floating point addition, has a latency of just 3 clock cycles. Let us try to understand why our function performs so poorly.

The following is the assembly code for the inner loop of the function:

1 2 3 4 5 6 7 Inner loop of psumi a in %rdi, i in %rax, cnt in %rdx .L5: vmovss -4(%rsi,%rax, 4), %xmmo

Perform an analysis similar to those shown for combine3 (Figure 5.14) and for write_read (Figure 5.36) to diagram the data dependencies created by this loop, and hence the critical path that forms as the computation proceeds. Explain why the CPE is so high.

Figure 5.1

Figure 5.14

%xmmo mul %xmmo load %rax %rdx cmp jne (a) add %rdx data [/] %xmmo load mul %xmmo (b) %rdx add %rdx

Figure 5.36

1 2 3 4 5 6 7 Inner loop of psumi a in %rdi, i in %rax, cnt in %rdx .L5: vmovss -4(%rsi,%rax, 4), %xmmo vaddss (%rdi,%rax, 4), %xmm0, %xmmo vmovss %xmm0, (%rsi,%rax, 4) $1, %rax addq cmpq %rdx, %rax jne .L5 loop: Get p[i-1] Add a[i] Store at p[i] Increment i Compare i:cnt If , goto loop

Step by Step Solution

★★★★★

3.37 Rating (147 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

We can see that this function has a writeread dependency ... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Computer Systems A Programmers Perspective Questions!

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Draw diagrams of your implementation in order to gain a better insight as to how this is accomplished. implement a sorted linked list analyze the code that we write Part A: Drawings (15% of the...

Hello, you already did chapter 1 and 2 of my MASTERS Thesis already for me. (See attached) So normally you know masters thesis consist of 5 chapters right ??..... But in this case my thesis will be 4...

Jones & Bartlett Learning, LLC. NOT FOR RESALE OR DISTRIBUTION CHAPTER Hot Spot Analysis 10 LEARNING OBJECTIVES C A R R Provide a working definition of a \"hot spot.\" , Be able to explain different...

few strategic decisions for the Role of VP- Research & Development - Tesla Case study ESE Business School IES865 University of Navarra September 2021 Tesla in the 2020s: Moment of Truth for the...

Need to give a few strategic decisions for the Role of VP- R&D - Tesla case study ESE Business School IES865 University of Navarra September 2021 Tesla in the 2020s: Moment of Truth for the "Master...

answer the following questions: What are your moral obligations as professionals to prevent such things from happening? What guidance do the professional codes provide (cite specific code sections in...

IfyouhaveplayedaSimulationcalledProBankerIneedhelpansweringthesequestionsassoonaspossible from the pro bankerassignment attachment..please use spreadsheet and players manual for reference. Need...

What is the symbol of an ion with 16 protons, 18 neutrons, and 18 electrons? What is the symbol for an ion that has 16 protons, 16 neutrons, and 18 electrons?

Refer to Exercise 12.19. a. Provide a 95% confidence interval for the true coefficients associated with age and weight. b. Interpret the confidence intervals provided in part (a).

Identify alternative internal and external foreign exchange risk management ( hedging ) strategies which VMSL can use to mitigate the FX risk exposure they are facing in the short - run and the long...

Ms. Hajjar stated that she didnt see it as her responsibility to monitor her teachers behaviors in their off time. Do you agree or disagree with this statement?

Explain the changes that would have to be made to the program of Code Fragment 3.8 so that it could perform the Caesar cipher for messages that are written in an alphabet-based language other than...

The removeFirst method of the SinglyLinkedList class includes a special case to reset the tail field to null when deleting the last node of a list (see lines 51 and 52 of Code Fragment 3.15). What...

Describe a method for finding the middle node of a doubly linked list with header and trailer sentinels by link hopping, and without relying on explicit knowledge of the size of the list. In the case...

. A 15000$, 8% bond with quarterly interest coupons redeemable at par in 7 years is purchased to yield at 9% compounded quarterly. (i) What is the premium or discount? (ii) What is the purchase...

The Handys Woodworking Company is a small-to-medium sized custom furniture and cabinet making company, with head-office and a spacious plant site at Industrial Estates, Melbourne. Its Chairman and...

Please explain revenue recognition of convergence and the development of a global set of accounting standards the process, the goals, the results to date, and the future prognosis changed. Please...