Software pipelining

Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler (or in the case of hand written assembly code, by the programmer) instead of the processor.

Software pipelining has been known to assembly language programmers of machines with instruction-level parallelism since such architectures existed.

Effective compiler generation of such code dates to the invention of modulo scheduling by Rau and Glaeser.

[2] Gao et al. formulated optimal software pipelining in integer linear programming, culminating in validation of advanced heuristics in an evaluation paper.

Without software pipelining, the operations execute in the following sequence: Assume that each instruction takes 3 clock cycles to complete (ignore for the moment the cost of the looping control flow).

See the article on loop unrolling for more on solutions to this problem, but note that software pipelining prevents the use of Duff's device.

Furthermore, if bignumber is expected to be moderate in size compared to the number of iterations unrolled (say 10-20), then the execution will spend most of its time in this inefficient prologue code, rendering the software pipelining optimization ineffectual.

While still better than attempting loop unrolling for this example, software pipelining requires a trade-off between speed and memory usage.

In fact, Monica Lam presents an elegant solution to this problem in her thesis, A Systolic Array Optimizing Compiler (1989) (ISBN 0-89838-300-5).

The trick is to replicate the body of the loop after it has been scheduled, allowing different registers to be used for different values of the same variable when they have to be live at the same time.

Nevertheless, for loops with large trip counts on architectures with enough instruction level parallelism, the technique easily performs well enough to be worth any increase in code size.