Cache performance measurement and metric

The performance of a computer system depends on the performance of all individual units—which include execution units like integer, branch and floating point, I/O units, bus, caches and memory systems.

The motivation for a cache and its hierarchy is to bridge this speed gap and overcome the memory wall.

Since the cache exists to bridge the speed gap, its performance measurement and metrics are important in designing and choosing various parameters like cache size, associativity, replacement policy, etc.

This is one major metric for cache performance measurement, because this number becomes highly significant and critical as processor speed increases.

Another useful metric to test the performance is Power law of cache misses.

Similarly, when you want to test the performance of the cache in terms of misses across different associativities, Stack distance profiling is used.

It has been observed that an increase in block size to a certain extent to exploit spatial locality leads to a decrease in cold misses.

Conflict misses occur when the data required was in the cache previously, but got evicted.

When the working set, i.e., the data that is currently important to the program, is bigger than the cache, capacity misses occur frequently.

Although this method is very intuitive, it leads to a longer access time and an increase in cache area and its power consumption.

Even if all the copies of memory block do not have the same value, it doesn't necessarily lead to a coherence miss.

A coherence miss occurs when threads execute loads such that they observe the different values of the memory block.

[4] The coherence problem is complex and affects the scalability of parallel programs.

A global order of all memory accesses to the same location must exist across the system to tackle this problem.

[5] The coverage miss count is the number of memory accesses that miss because a cache line that would otherwise be present in the processor's cache has been invalidated as a consequence of a directory eviction.

When a context switch occurs, the cache state is modified and some of its blocks are replaced.

Some blocks in the cache may not be replaced due to context switching but their recency is changed.

However, when the suspended process resumes execution, reordered blocks don't lead to context switch misses when no other misses cause these reordered blocks to be evicted.

System-related misses become significant when context switching occurs regularly.

These cache misses directly correlate to the increase in cycles per instruction (CPI).

[2] If we ignore both these effects, then the average memory access time becomes an important metric.

It refers to the average time it takes to perform a memory access.

This empirical observation led to the mathematical form of power law, which shows the relation between the miss rate and the cache size.

This law holds true only for a certain finite range of cache sizes, up to which the miss rate doesn't flatten out.

The miss rate eventually becomes stagnant at a certain, large enough cache size, and after that the relation does not give correct estimates.

The power law of cache misses just showed a rough approximation of the same.

A stack distance profile captures the temporal reuse behavior of an application in a fully or set-associative cache.

Applications that exhibit more temporal reuse behavior generally access data that is more recently used.

To collect the stack distance profile information of this cache, assuming it has LRU replacement policy,

This profiling information has a limitation that it can only capture the temporal reuse across different associativities.