Branch target predictor

In more parallel processor designs, as the instruction cache latency grows longer and the fetch width grows wider, branch target extraction becomes a bottleneck.

As predicted branches happen every 10 instructions or so, this can force a substantial drop in fetch bandwidth.

Some machines with longer instruction cache latencies would have an even larger loss.

A refinement of the idea predicts the start of a sequential run of instructions given the address of the start of the previous sequential run of instructions.

This predictor reduces the recurrence above to: As the predictor RAM can be 5–10% of the size of the instruction cache, the fetch happens much faster than the instruction cache fetch, and so this recurrence is much faster.