Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Assuming for the following loop is to be executed on a 4-unit VLIW processor that can execute an instruction on any execution unit, show how

Assuming for the following loop is to be executed on a 4-unit VLIW processor that can execute an instruction on any execution unit, show how a compiler would schedule the original loop and unrolled version (4 times). Assume the processor has as many architectural registers as required, the latencies of 3 cycles for LD operations, and 2 cycles for DIVs and ADDs. Assume the branch delay of the processor is long enough that all operations in one iteration complete before the next iteration starts. As in other VLIW problems, assume that the compiler examines all possible operation orderings to find one that fits into the fewest number of instructions. Compare how much faster the unrolled loop over the original loop:

loop:

LD r1, (r2)

LD r3, (r4)

LD r5, (r6)

ADD r1, r1, r3

ADD r1, r1, r5

DIV r1, r1, r7

ST (r0), r1

ADD r2, #4, r2

ADD r4, #4, r4

ADD r6, #4, r6

ADD r0, #4, r0

BR loop

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions