Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Exercise 5. [20 Marks] The Intel Core i7-8750H processor (more details here) has the following characteristics, taken from /proc/cpuinfo: ( begin{array}{ll}text { vendor_id } &
Exercise 5. [20 Marks] The Intel Core i7-8750H processor (more details here) has the following characteristics, taken from /proc/cpuinfo: \( \begin{array}{ll}\text { vendor_id } & : \text { GenuineIntel } \\ \text { cpu family } & : 6 \\ \text { model } & : 158 \\ \text { model name } & : \text { Intel(R) Core(TM) i7-8750H CPU @ } 2.20 \mathrm{GHz} \\ \text { stepping } & : 10 \\ \text { microcode } & : 0 \text { 0xea } \\ \text { cpu MHz } & : 2200.000 \\ \text { L1 cache size } & : 32 \mathrm{~KB} \\ \text { L2 cache size } & : 256 \mathrm{~KB} \\ \text { L3 cache size } & : 9216 \mathrm{~KB} \\ \text { physical id } & : 0 \\ \text { siblings } & : 12 \\ \text { core id } & : 0 \\ \text { cpu cores } & : 6 \\ \text { \{... } & \\ \text { clflush size } & : 64 \\ \text { cache line size } & : 64\end{array} \) Consider the following two functions Normalize1 and Normalize2 which take in a positive NxN matrix of doubles and normalizes its entries to the range [0,1]. A program implementing these functions is available on OWL as Normalize.c. Consider the following two functions Normalize1 and Normalize2 which take in a positive NxN matrix of doubles and normalizes its entries to the range [0,1]. A program implementing these functions is available on OWL as Normalize.c. After those two code segmenets, data is presented which was collected using the perf utility. This data shows runtime performance metrics of these functions executing on the Intel Core i7-8750H processor for various data sizes. In this data: - "Normalize.bin 1 ..." executes the function Normalize1; - "Normalize.bin 2 ..." executes the function Normalize2; - the second command-line argument is the size N of the matrix; - LLC-loads means "Last Level Cache loads", the number of accesses to L3; - LLC-load-misses means "Last Level Cache misses", the number of L3 cache misses; - cpu-cycles is the number of CPU cycles elapsed during programing execution. Using the knowledge learned so far in this course, the specification of the i7-8750H processor, the code fragments, and the perf data, answer the following questions. (a) Why is the runtime execution of Normalize1 faster than Normalize2? The values of which performance metrics from the perf data support your claims? (b) Consider the miss rates of Normalize1. The miss rate drastically increases for values of N larger than 512. Explain why the increase occurs at this particular value of N. Give a reason why this increase is not a sharp "jump" but rather has an intermediate effect at N=1024. (c) Consider the miss rates of Normalize2. The miss rate starts quite low but quickly increases for increasing values of N. Disucss why the miss rates for N=256 and N=512 are misleading for describing the actual data locality of Normalize2. Which additional performance metrics would you record to get a more precise understanding of the data locality of Normalize2? Assume perf is capable of reporting any possible hardware event
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started