Question
Suppose you are programming a processor with an add latency of 3 clock cycles and a multiply latency of 5 cycles. It is also given
Suppose you are programming a processor with an add latency of 3 clock cycles and a multiply latency of 5 cycles. It is also given that this processor can complete one add and one multiply instruction every clock cycle, when instructions are fully pipelined. Consider the following loop:
for (i=0; i A[i] = B[i] * C[i] + D[i] + i; } 1a) Assuming the program is executed as-is (i.e. no pipelining), what is the lower bound on execution time (in clock cycles) based on the math performed? 1b) How can you exploit more instruction level parallelism in this program? What changes do you propose? 1c) Assuming you can pipeline the adds and multiplies, what would be the lower bound on execution time in clock cycles for the arithmetic?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started