Pipeline (computing)

Queueing theory can tell the number of buffer slots needed, depending on the variability of the processing times and on the desired performance.

This concept of "non-linear" or "dynamic" pipeline is exemplified by shops or banks that have two or more cashiers serving clients from a single waiting queue.

In order to handle such conflicts correctly, the pipeline must be provided with extra circuitry or logic that detects them and takes the appropriate action.

For example, UNIX derivatives may pipeline commands connecting various processes' standard IO, using the pipes implemented by the operating system.

[3] Other strategies relying on cooperative multitasking exist, that do not need multiple threads of execution and hence additional CPU cores, such as using a round-robin scheduler with a coroutine-based framework.

In this context, each stage may be instantiated with its own coroutine, yielding control back to the scheduler after finishing its round task.

A pipelined system typically requires more resources (circuit elements, processing units, computer memory, etc.)

than one that executes one batch at a time, because its stages cannot share those resources, and because buffering and additional synchronization logic may be needed between the elements.

The additional complexity cost of pipelining may be considerable if there are dependencies between the processing of different items, especially if a guess-and-backtrack strategy is used to handle them.

Indeed, the cost of implementing that strategy for complex instruction sets has motivated some radical proposals to simplify computer architecture, such as RISC and VLIW.

However, with the advent of data analytics engines such as Hadoop, or more recently Apache Spark, it's been possible to distribute large datasets across multiple processing nodes, allowing applications to reach heights of efficiency several hundred times greater than was thought possible before.

The effect of this today is that even a mid-level PC using distributed processing in this fashion can handle the building and running of big data pipelines.