Hardware performance counter

The number of available hardware counters in a processor is limited while each CPU model might have a lot of different events that a developer might like to measure.

One of the first processors to implement such counter and an associated instruction RDPMC to access it was the Intel Pentium, but they were not documented until Terje Mathisen wrote an article about reverse engineering them in Byte July 1994.

[2] The following table shows some examples of CPUs and the number of available hardware counters: Compared to software profilers, hardware counters provide low-overhead access to a wealth of detailed performance information related to CPU's functional units, caches and main memory etc.

The limited number of registers to store the counters often force users to conduct multiple measurements to collect all desired performance metrics.

These "in-flight" instructions can retire at any time, depending on memory access, hits in cache, stalls in the pipeline and many other factors.

Output of an IBS profile from CodeAnalyst .