Translation lookaside buffer

The page walk is time-consuming when compared to the processor speed, as it involves reading the contents of multiple memory locations and using them to compute the physical address.

The PowerPC 604, for example, has a two-way set-associative TLB for data loads and stores.

Hence, the TLB is used to reduce the time taken to access the memory locations in the page-table method.

The TLB is a cache of the page table, representing only a subset of the page-table contents.

Since the TLB lookup is usually a part of the instruction pipeline, searches are fast and cause essentially no performance penalty.

Upon each virtual memory reference, the hardware checks the TLB to see whether the page number is held therein.

Depending on the CPU, this can be done automatically using a hardware or using an interrupt to the operating system.

These are all slow, due to the need to access a slower level of the memory hierarchy, so a well-functioning TLB is important.

Indeed, a TLB miss can be more expensive than an instruction or data cache miss, due to the need for not just a load from main memory, but a page walk, requiring several memory accesses.

If the present bit is set, then the page is in main memory, and the processor can retrieve the frame number from the page-table entry to form the physical address.

[14] The Itanium architecture provides an option of using either software- or hardware-managed TLBs.

As the PALcode for a processor can be processor-specific and operating-system-specific, this allows different versions of PALcode to implement different page-table formats for different operating systems, without requiring that the TLB format, and the instructions to control the TLB, to be specified by the architecture.

[16] These are typical performance levels of a TLB:[17] The average effective memory cycle rate is defined as

This means that after a switch, the TLB is empty, and any memory reference will be a miss, so it will be some time before things are running back at full speed.

Newer CPUs use more effective strategies marking which process an entry is for.

[19] Other strategies avoid flushing the TLB on a context switch: (a) A single address space operating system uses the same virtual-to-physical mapping for all processes.

For example, in the Alpha 21264, each TLB entry is tagged with an address space number (ASN), and only TLB entries with an ASN matching the current task are considered valid.

For another example, in the Intel Pentium Pro, the page global enable (PGE) flag in the register CR4 and the global (G) flag of a page-directory or page-table entry can be used to prevent frequently used pages from being automatically invalidated in the TLBs on a task switch or a load of register CR3.

Since the 2010 Westmere microarchitecture Intel 64 processors also support 12-bit process-context identifiers (PCIDs), which allow retaining TLB entries for multiple linear-address spaces, with only those that match the current PCID being used for address translation.

Mitigation strategies such as kernel page-table isolation (KPTI) rely heavily on performance-impacting TLB flushes and benefit greatly from hardware-enabled selective TLB entry management such as PCID.

Hence, every time there is a change in address space, such as a context switch, the entire TLB has to be flushed.

Not all operating systems made full use of these tags immediately, but Linux 4.14 started using them to identify recently used address spaces, since the 12-bits PCIDs (4095 different values) are insufficient for all tasks running on a given CPU.

General working of TLB [ 3 ]
Flowchart [ 5 ] shows the working of a translation lookaside buffer. For simplicity, the page-fault routine is not mentioned.