Temporal locality refers to the reuse of specific data and/or resources within a relatively small time duration.
Systems that exhibit strong locality of reference are great candidates for performance optimization through the use of techniques such as the caching, prefetching for memory and advanced branch predictors of a processor core.
The reasons below are not disjoint; in fact, the list below goes from the most general case to special cases: If most of the time the substantial portion of the references aggregate into clusters, and if the shape of this system of clusters can be well predicted, then it can be used for performance optimization.
Finally, temporal locality plays a role on the lowest level, since results that are referenced very closely together can be kept in the machine registers.
Thus, a program will achieve greater performance if it uses memory while it is cached in the upper levels of the memory hierarchy and avoids bringing other data into the upper levels of the hierarchy that will displace data that will be used shortly in the future.
Typical memory hierarchy (access times and cache sizes are approximations of typical values used as of 2013[update] for the purpose of discussion; actual values and actual numbers of levels in the hierarchy vary): Modern machines tend to read blocks of lower memory into the next level of the memory hierarchy.
In this case, "large" means, approximately, more than 100,000 elements in each matrix, or enough addressable memory such that the matrices will not fit in L1 and L2 caches.
The larger matrix can be divided into evenly sized sub-matrices, so that the smaller blocks can be referenced (multiplied) several times while in memory.