Trace cache

[1] This widely acknowledged paper was presented by Eric Rotenberg, Steve Bennett, and Jim Smith at 1996 International Symposium on Microarchitecture (MICRO) conference.

An earlier publication is US patent 5381533,[3] by Alex Peleg and Uri Weiser of Intel, "Dynamic flow instruction cache memory organized around trace segments independent of virtual address line", a continuation of an application filed in 1992, later abandoned.

Wider superscalar processors demand multiple instructions to be fetched in a single cycle for higher performance.

So processors need additional logic and hardware support to fetch and align such instructions from non-contiguous basic blocks.

Consider these four basic blocks (A, B, C, D) as shown in the figure that correspond to a simple if-else loop.

This method of tagging helps to provide path associativity to the trace cache.

In the instruction fetch stage of a pipeline, the current PC along with a set of branch predictions is checked in the trace cache for a hit.

If there is a hit, a trace line is supplied to fetch unit which does not have to go to a regular cache or to memory for these instructions.

Working of a trace cache
Basic blocks of a simple if-else loop