Merge sort

Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output.

Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945.

[3] A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

[7] For large n and a randomly ordered input list, merge sort's expected (average) number of comparisons approaches α·n fewer than the worst case, where

[7] Merge sort is more efficient than quicksort for some types of lists if the data to be sorted can only be efficiently accessed sequentially, and is thus popular in languages such as Lisp, where sequentially accessed data structures are very common.

Both monotonic and bitonic (alternating up/down) runs may be exploited, with lists (or equivalently tapes or files) being convenient data structures (used as FIFO queues or LIFO stacks).

[9] In the bottom-up merge sort, the starting point assumes each run is one item long.

Doing so omits the copy operation and reduces the total number of moves by half.

[12] One drawback of merge sort, when implemented on arrays, is its O(n) working memory requirement.

In fact, there are techniques that can make the initial runs longer than the available internal memory.

One of them, the Knuth's 'snowplow' (based on a binary min-heap), generates runs twice as long (on average) as a size of memory used.

On modern computers, locality of reference can be of paramount importance in software optimization, because multilevel memory hierarchies are used.

Cache-aware versions of the merge sort algorithm, whose operations have been specifically chosen to minimize the movement of pages in and out of a machine's memory cache, have been proposed.

This algorithm has demonstrated better performance[example needed] on machines that benefit from cache optimization.

(LaMarca & Ladner 1997) Merge sort parallelizes well due to the use of the divide-and-conquer method.

The first consists of many recursive calls that repeatedly perform the same division process until the subsequences are trivially sorted (containing one or no element).

[19] In one of the sequences (the longer one if unequal length), the element of the middle index is selected.

For the partial sequences of the smaller and larger elements created in this way, the merge algorithm is again executed in parallel until the base case of the recursion is reached.

In order to analyze a recurrence relation for the worst case span, the recursive calls of parallelMergesort have to be incorporated only once due to their parallel execution, obtaining

Hence, each processor performs the p-way merge locally and thus obtains a sorted sequence from its sub-sequences.

Because of the second property, no further p-way-merge has to be performed, the results only have to be put together in the order of the processor number.

We assume that there is a barrier synchronization before and after the multisequence selection such that every processor can determine the splitting elements and the sequence partition properly.

The multiway merge sort algorithm is very scalable through its high parallelization capability, which allows the use of many processors.

This makes the algorithm a viable candidate for sorting large amounts of data, such as those processed in computer clusters.

Also, since in such systems memory is usually not a limiting resource, the disadvantage of space complexity of merge sort is negligible.

Sanders et al. have presented in their paper a bulk synchronous parallel algorithm for multilevel multiway mergesort, which divides

The hierarchical structure of the underlying real network can be used to define the processor groups (e.g. racks, clusters,...).

[24] Other sophisticated parallel sorting algorithms can achieve the same or better time bounds with a lower constant.

For example, in 1991 David Powers described a parallelized quicksort (and a related radix sort) that can operate in O(log n) time on a CRCW parallel random-access machine (PRAM) with n processors by performing partitioning implicitly.

[30] Timsort, a tuned hybrid of merge sort and insertion sort is used in variety of software platforms and languages including the Java and Android platforms[31] and is used by Python since version 2.3; since version 3.11, Timsort's merge policy was updated to Powersort.