course notes). A. Diagram how this instruction sequence would be decoded into operations, and show how the data depen- dencies between them would create a critical path of operations in the style of Figures 5.13 (F sequential) and 5.14 (Pigure: opt/dpb-flow and Figure: opt/dpb-flow-abstract). (25 points.) B. For data type double, what lower bound on the CPE is determined by the critical path? Give a numerical value and an explanation. (6 points.) C. Assuming similar instruction sequences for the integer code as well, what lower bound on the CPE is determined by the critical path for integer data? Give a numerical value and an explanation. (6 points.) D. Explain how the floating-point version can have a CPE of 3.00 even though the multiplication operation requires 5 cycles. (6 points.) gure: opt/dpb- Hw6-2 (31 points) Write a version of the inner product procedure described in the previous problem that uses six-way loop unrolling (6 x 1; no parallelism). (15 points.) For x86-64, our measurements of the unrolled version give a CPE of 1.07 for integer data but still 3.01 for floating-point data. A. Explain why any version of any inner product procedure (even with parallelism) cannot achieve a CPE less than 1.00. (8 points.) B. Explain why the performance for floating-point data did not improve with loop unrolling. (8 points.) HW6-3 (15 points) Write a version of the inner product procedure described above that uses 6 la loop unrolling to enable greater parallelism (six-way unrolling, one accumulator, and a reassociation transformation). (Measurements for this function give a CPE of 1.10 for integer data and 1.05 for floating-point data.) course notes). A. Diagram how this instruction sequence would be decoded into operations, and show how the data depen- dencies between them would create a critical path of operations in the style of Figures 5.13 (F sequential) and 5.14 (Pigure: opt/dpb-flow and Figure: opt/dpb-flow-abstract). (25 points.) B. For data type double, what lower bound on the CPE is determined by the critical path? Give a numerical value and an explanation. (6 points.) C. Assuming similar instruction sequences for the integer code as well, what lower bound on the CPE is determined by the critical path for integer data? Give a numerical value and an explanation. (6 points.) D. Explain how the floating-point version can have a CPE of 3.00 even though the multiplication operation requires 5 cycles. (6 points.) gure: opt/dpb- Hw6-2 (31 points) Write a version of the inner product procedure described in the previous problem that uses six-way loop unrolling (6 x 1; no parallelism). (15 points.) For x86-64, our measurements of the unrolled version give a CPE of 1.07 for integer data but still 3.01 for floating-point data. A. Explain why any version of any inner product procedure (even with parallelism) cannot achieve a CPE less than 1.00. (8 points.) B. Explain why the performance for floating-point data did not improve with loop unrolling. (8 points.) HW6-3 (15 points) Write a version of the inner product procedure described above that uses 6 la loop unrolling to enable greater parallelism (six-way unrolling, one accumulator, and a reassociation transformation). (Measurements for this function give a CPE of 1.10 for integer data and 1.05 for floating-point data.)