Parallel sections may fork recursively until a certain task granularity is reached.
[2][3] By nesting fork–join computations recursively, one obtains a parallel version of the divide and conquer paradigm, expressed by the following generic pseudocode:[4] The simple parallel merge sort of CLRS is a fork–join algorithm.
If both recursive calls were set up as subtasks, the main task would not have any additional work to perform before being blocked at the join.
[1] Implementations of the fork–join model will typically fork tasks, fibers or lightweight threads, not operating-system-level "heavyweight" threads or processes, and use a thread pool to execute these tasks: the fork primitive allows the programmer to specify potential parallelism, which the implementation then maps onto actual parallel execution.
[6] It is also supported by the Java concurrency framework,[7] the Task Parallel Library for .NET,[8] and Intel's Threading Building Blocks (TBB).