Fork–join model

Parallel sections may fork recursively until a certain task granularity is reached.

[2][3] By nesting fork–join computations recursively, one obtains a parallel version of the divide and conquer paradigm, expressed by the following generic pseudocode:[4] The simple parallel merge sort of CLRS is a fork–join algorithm.

If both recursive calls were set up as subtasks, the main task would not have any additional work to perform before being blocked at the join.

[1] Implementations of the fork–join model will typically fork tasks, fibers or lightweight threads, not operating-system-level "heavyweight" threads or processes, and use a thread pool to execute these tasks: the fork primitive allows the programmer to specify potential parallelism, which the implementation then maps onto actual parallel execution.

[6] It is also supported by the Java concurrency framework,[7] the Task Parallel Library for .NET,[8] and Intel's Threading Building Blocks (TBB).

An illustration of the fork–join paradigm, in which three regions of the program permit parallel execution of the variously colored blocks. Sequential execution is displayed on the top, while its equivalent fork–join execution is on the bottom.