Question: When we use gcc to compile combine3 with command-line option -02, we get code with substantially better CPE performance than with -01: We achieve performance

When we use gcc to compile combine3 with command-line option -02, we get code with substantially better CPE performance than with -01:

Function Page Method combine3 549 combine3. 549 combine4 551 Compiled -01 Compiled -02 Accumulate in

We achieve performance comparable to that for combine4, except for the case of integer sum, but even it improves significantly. On examining the assembly code generated by the compiler, we find an interesting variant for the inner loop:

1 2 3 4 5 6 1 vmulsd (%rdx), %xmm0, %xmm0 addq $8, %rdx cmpq %rax, %rdx Compare to data+length Store product

We see that, besides some reordering of instructions, the only difference is that the more optimized version does not contain the vmovsd implementing the read from the location designated by dest (line 2).

A. How does the role of register %xmm0 differ in these two loops?

B. Will the more optimized version faithfully implement the C code of combine3, including when there is memory aliasing between dest and the vector data?

C. Either explain why this optimization preserves the desired behavior, or give an example where it would produce different results than the less optimized code.

Function Page Method combine3 549 combine3. 549 combine4 551 Compiled -01 Compiled -02 Accumulate in temporary Integer + 7.17 1.60 1.27 9.02 3.01 3.01 Floating point 9.02 3.01 3.01 11.03 5.01 5.01

Step by Step Solution

★★★★★

3.34 Rating (160 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

This assembly code demonstrates a clever optimization opportunity detected by gcc It is worth studyi... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Computer Systems A Programmers Perspective Questions!

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

The fixed budget for 21,300 units of production shows sales of $468,600; variable costs of $63,900; and fixed costs of $143,000. The company's actual sales were 26,300 units at $531,600. Actual...

What is the value of the trigonometric function? Enter your answer in the box in simplest radical form. tan (-57) = sec (1/3) = COS 31 ( 317 ) = 1 2

Hi, I am having trouble configuring the cost of debt from the attached 10-k for O'Reilly. These are steps I must follow: a.Cost of Debt i.Determine the market value of the firm's debt issues. Be sure...

Using disclosures made in the 2015 Woolworths Annual Report, 2015 Woolworths CSR Report or current website for Woolworths (http://www.woolworthslimited.com.au) identify and provide 4 examples where...

a brief summary what was understood on the aspect of the causes or effects of segregation in city ( any kind of city within US). 1000word. Thank you REFERENCE: CHAPTER 5 Residential Segregation and...

Hello, I need some assistance with this project. I have everything completed except the third and fourth question (located in the instruction project document. 3.)You are a security analyst...

Find attached Ingredients: Water, MCC, Salt, Nicotine, pH regulator, sweeteners and flavours I am wondering why the decision to go with LYFT. I understand migrating to LYFT would be easier from sells...

Please find the attachment and see my questions. These questions are related to finance. A global team - winning together Annual Report 2009 Unternehmen The Company Henkel at a glance Global supplier...

1. Synthesize the article 2. Do you agree/disagree with the authors perspectives? Why? 3. Provide a current example supporting or disagreeing with the perspective. 4. How could the authors enhance...

Security IDs Company purchased equipment on July 1, 2010, for $135,000. The equipment was expected to have a useful life of three years, or 12,000 operating hours, and a residual value of $6,000. The...

Consider again Problem 10. Use equation (9.1) to determine whether your solution to Problem 10 is unique. If your solution is not unique, use equation (9.1) iteratively to find all alternative...

The dollar - weighted return on a portfolio is equivalent to

9.15 A steel sphere of 0.25 in. diameter has a velocity of 200 ft s at an altitude of 30,000 ft in the U.S. Standard Atmosphere. Calculate the drag force on this sphere.

Consider a variant of Exercise C-7.29, in which an array of capacity N, is resized to capacity precisely that of the number of elements, any time the number of elements in the array goes strictly...

In Section 7.5.3, we demonstrated how the Collections.shuffle method can be adapted to shuffle a reference-type array. Give a direct implementation of a shuffle method for an array of int values. You...

Give an implementation of the deque ADT using an array list for storage.

QUESTION TWO [30] Richy Limited purchased a plant that cost R500 000 on 1 January 2016. Depreciation is provided over its useful life of 5 years using the straight line method to a nil residual...

Measures of liquidity, Solvency, and Profitability The comparative financial statements of Marshall Inc. are as follows. The market price of Marshall common stock was $ 57 on December 31, 20Y2....

___ 1. Which if the following is not an example of a liability rule: (a) Eminent domain. (b) Specific performance. (c) Expectation damages for breach of contract. (d) Strict liability for a tort.