Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Write a version of the inner product procedure described in the problem 5.13 in the textbook that uses 6_6 loop unrolling. Our measurements for this
Write a version of the inner product procedure described in the problem 5.13 in the textbook that uses 6_6 loop unrolling. Our measurements for this function with x86-64 give a CPE of 1.06 for integer data and 1.01 for floating-point data What factor limits the performance to a CPE of 1.00? 4 Fill in the missing parts of the code below 1 Inner Product. 6 X6 unrolling 2 void inner_u6x6(vec_ptr u, vec_ptr v, data_t *dest) long length long limit ..., data_t udata - get_vec_start(u); data_t *vdta - get_vec_start (v) data-t sumo = (data-t) 0; data_t sum1(data_t) 0; data_t sum2(data_t) 0; data_t sum3 (data_t) 0; data_t sum4(data_t) 0; data-t sumb (data-t) 0; Do 6 elements at a time/ for (..) 10 13 15 17 18 19 20 21 sum1.... sum2- sum3 sum4- 23 24 25 26 27 28 29 /Finish off any remaining elements for (...) *dest
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started