Code running on a single core and not sharing any variables with other cores can suffer some
Question:
Code running on a single core and not sharing any variables with other cores can suffer some performance degradation because of the snooping coherence protocol.
Consider the two following iterative loops are NOT functionally equivalent but they seem similar in complexity. One could be led to conclude that they would spend a comparably close number of cycles when executed on the same processor core.
Assume that
■ Every cache line can hold exactly one element of A or B;
■ Arrays A and B do not interfere in the cache;
■ All the elements of A or B are in the cache before either loop is executed.
Compare their performance when run on a core whose cache uses the MESI coherence protocol. Use the stall cycles data for Implementation 1 in Figure 5.38.
Assume that a cache line can hold multiple elements of A and B (A and B go to separate cache lines). How will this affect the relative performances of Loop1 and Loop2?
Suggest hardware and/or software mechanisms that would improve the performance of Loop1 on a single core.
Step by Step Answer:
Computer Architecture A Quantitative Approach
ISBN: 9780128119051
6th Edition
Authors: John L. Hennessy, David A. Patterson