Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

SSE/AVX/AVX512 programming: write assembly statements for overlapping two images of equal size to produce a new image. See the figure attached which shows four steps

SSE/AVX/AVX512 programming: write assembly statements for overlapping two images of equal size to produce a new image.

See the figure attached which shows four steps that overlap two images (B and E) to result in G. Given two images, B and E, you are asked to produce image G with no blue background. Assuming the images each are of size 1K x 1K pixels or 1 MB, where each pixel is a byte representing 256 colors. For 1024 * 1024 (1 MB) images, a simple minded loop would require 1 million iterations (1K * 1K), where each iteration works on a pixel. For large-sized images, this simple minded loop is prohibitively expensive. A solution to this expensive computation beside CUDA and OpenCL for GPGPU programming is SSE/AVX/AVX512 instructions which allow for one instruction to operate on 16, 32, or 64 bytes at the same time. With SSE/AVX/AVX512, a row for 1024 * 1024 (1 MB) image can now be processed in 64, 32, or 16 iterations. The entire image can therefore require only 64K, 32K, or 16K iterations instead of 1 million.

The assembly instructions required for this problem are variation of PCMPEQ, ANDP, ANDNP, and POR which are documented in the Intel manual (https://software.intel.com/en-us/articles/intel-sdm#combined). For example, the instructions for AVX512 to overlap the two images are variations of the following AVX512 instructions:

 VPCMPEQD zmm1, zmm2, zmm3 /m256 -> cmp VANDPS zmm1, zmm2, zmm3/m256 -> and VANDNPS zmm1, zmm2, zmm3/m256 -> !and VPORQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst -> or

Assuming

image A is loaded in xmm/ymm/zmm1

image B is loaded in xmm/ymm/zmm3

image E is loaded in xmm/ymm/zmm4

and 8-bit 256-color map below:

red is 0xE0,

greeen is 0x1C,

blue is 0x03,

yellow is 0xFC

white is 0xFF

black is 0x00

write four assembly instructions for

128-bit SSE using xmm registers

256-bit AVX using ymm registers

512-bit AVX512 using zmm registers

The four assembly instructions are called loop body inside a loop that iterates 64K, 32K, or 16K depending on SSE, AVX, or AVX512. You may use register xmm/ymm/zmm2 to hold a temporary value.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Systems Design Implementation And Management

Database Systems Design Implementation And Management

Authors: Peter Robb,Carlos Coronel

5th Edition

061906269X, 9780619062699

More Books

Students also viewed these Databases questions

Question

2. Ask everyone to answer the question, What are the needs and interests in this situation?

Answered: 1 week ago

Question

★★★★★

In February 2014, Storage Company worked on five job orders for specialty cedar storage cabinets. It began Job Z-6 for Cedar Safe, Inc., on February 10 and completed it on February 24. Partial data...

Answered: 1 week ago

Question

★★★★★

SSE/AVX/AVX512 programming: write assembly statements for overlapping two images of equal size to produce a new image. See the figure attached which shows four steps that overlap two images (B and E)...

Answered: 1 week ago

Question

★★★★★

\f(1 point) Let f(x) = (Inc) f' (x ) =0 (1 point) Compute the derivatives ot the given functions. a) = 8 In x. f' (x) b) g(x) = In(x8). g' (x)

Answered: 1 week ago

Question

★★★★★

Using the Smith's BBQ Report, if your Total Cost of Sales will increase by 1 % next week, how much in Total Sales must you make next week in order for your gross margin to equal $ 3 2 , 0 0 0 ? (...

Answered: 1 week ago

Question

★★★★★

Part 2: Solve a real-world problem using a square root function. When Isabel began her book-selling business, she stored her inventory in her garage. Now that her business has grown, she wants to...

Answered: 1 week ago

Question

★★★★★

Find dy/dx if x = te, y = 2t2 +1

Answered: 1 week ago

Question

★★★★★

Problem 3. (15 pts) Find the responses of systems governed by the following equations of motion for the initial conditions x(0) = 1, x(0) = -1: a. 2x8x + 16x = 0 b. 3x + 12x + 9x = 0 c. 2x+8x+8x= 0

Answered: 1 week ago

Question

★★★★★

Show that an n n linear system Ax = b over the complex numbers can be written as a 2n 2n system over the real numbers. Hint: split the matrix and the vectors in their real and imaginary parts. Argue...

Answered: 1 week ago

Question

★★★★★

If the tax rate is 40 percent, compute the beforetax real interest rate and the after-tax real interest rate in each of the following cases. a. The nominal interest rate is 10 percent and the...

Answered: 1 week ago

Question

★★★★★

Assume that the reserve requirement is 20%. Also assume that banks do not hold excess reserves and there is no cash held by the public. The Federal Reserve decides that it wants to expand the money...

Answered: 1 week ago

Question

★★★★★

It is often suggested that the Federal Reserve try to achieve zero inflation. If we assume that velocity is constant, does this zero-inflation goal require that the rate of money growth equal zero?...

Answered: 1 week ago

Previous Question Next Question