Instruction pipelining

In computer engineering, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor.

In a pipelined computer, instructions flow through the central processing unit (CPU) in stages.

This allows more CPU throughput than a multicycle computer at a given clock rate, but may increase latency due to the added overhead of the pipelining process itself.

A pipelined model of computer is often the most economical, when cost is measured as logic gates per instruction per second.

In contrast, out of order computers usually have large amounts of idle logic at any given instant.

In a pipelined computer, the control unit arranges for the flow to start, continue, and stop as a program commands.

But when a program switches to a different sequence of instructions, the pipeline sometimes must discard the data in process and restart.

[citation needed] One of the early supercomputers was the Cyber series built by Control Data Corporation.

In 1984, Star Technologies added the pipelined divide circuit developed by James Bradley.

In 1976, the Amdahl Corporation's 470 series general purpose mainframe had a 7-step pipeline, and a patented branch prediction circuit.

[citation needed] The model of sequential execution assumes that each instruction completes before the next one begins; this assumption is not true on a pipelined processor.

In some early DSP and RISC processors, the documentation advises programmers to avoid such dependencies in adjacent and nearly adjacent instructions (called delay slots), or declares that the second instruction uses an old value rather than the desired value (in the example above, the processor might counter-intuitively copy the unincremented value), or declares that the value it uses is undefined.

Unless the processor can give effect to the branch in a single time cycle, the pipeline will continue fetching instructions sequentially.

Such instructions cannot be allowed to take effect because the programmer has diverted control to another part of the program.

Programs written for a pipelined processor deliberately avoid branching to minimize possible loss of speed.

Because of the bubble (the blue ovals in the illustration), the processor's Decode circuitry is idle during cycle 3.

Generic 4-stage pipeline; the colored boxes represent instructions independent of each other
A bubble in cycle 3 delays execution.