Sorting algorithm

From the beginning of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement.

Among the authors of early sorting algorithms around 1951 was Betty Holberton, who worked on ENIAC and UNIVAC.

Similarly optimal (by various definitions) sorting on a parallel machine is an open research topic.

Stable sorting algorithms choose one of these, according to the following rule: if two items compare as equal (like the two 5 cards), then their relative order will be preserved, i.e. if one comes before the other in the input, it will come before the other in the output.

Mathematical analysis demonstrates a comparison sort cannot perform better than O(n log n) on average.

These algorithms are not limited to Ω(n log n) unless meet unit-cost random-access machine model as described below.

These sorts are usually described for educational purposes to demonstrate how the run time of algorithms is estimated.

The following table describes some sorting algorithms that are impractical for real-life use in traditional software contexts due to extremely poor performance or specialized hardware requirements.

Bubble sort and variants are rarely used in practice, but are commonly found in teaching and theoretical discussions.

It has O(n2) complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort.

Selection sort is noted for its simplicity and also has performance advantages over more complicated algorithms in certain situations.

[24] Of the algorithms described here, this is the first that scales well to very large lists, because its worst-case running time is O(n log n).

However, it has additional O(n) space complexity and involves a large number of copies in simple implementations.

[29] Once the data list has been made into a heap, the root node is guaranteed to be the largest (or smallest) element.

When it is removed and placed at the end of the list, the heap is rearranged so the largest element remaining moves to the root.

Using the heap, finding the next largest element takes O(log n) time, instead of O(n) for a linear scan as in simple selection sort.

This yields an average time complexity of O(n log n), with low overhead, and thus this is a popular algorithm.

Together with its modest O(log n) space usage, quicksort is one of the most popular sorting algorithms and is available in many standard programming libraries.

The important caveat about quicksort is that its worst-case performance is O(n2); while this is rare, in naive implementations (choosing the first or last element as pivot) this occurs for sorted data, which is a common case.

, which is approximately twice the maximum recursion depth one would expect on average with a randomly ordered array.

This, combined with the fact that Shellsort is in-place, only needs a relatively small amount of code, and does not require use of the call stack, makes it is useful in situations where memory is at a premium, such as in embedded systems and operating system kernels.

[34] This algorithm's average time and worst-case performance is O(n2), so it is rarely used to sort large, unordered data sets.

[36] It was later rediscovered and popularized by Stephen Lacey and Richard Box with a Byte Magazine article published in April 1991.

(Rabbits, large values around the beginning of the list, do not pose a problem in bubble sort) It accomplishes this by initially swapping elements that are a certain distance from one another in the array, rather than only swapping elements if they are adjacent to one another, and then shrinking the chosen distance until it is operating as a normal bubble sort.

This allows external sorting of data too large to fit into a single computer's memory.

When the size of the array to be sorted approaches or exceeds the available primary memory, so that (much slower) disk or swap space must be employed, the memory usage pattern of a sorting algorithm becomes important, and an algorithm that might have been fairly efficient when the array fit easily in RAM may become impractical.

For example, the popular recursive quicksort algorithm provides quite reasonable performance with adequate RAM, but due to the recursive way that it copies portions of the array it becomes much less practical when the array does not fit in RAM, because it may cause a number of slow copy or move operations to and from disk.

One way to work around this problem, which works well when complex records (such as in a relational database) are being sorted by a relatively small key field, is to create an index into the array and then sort the index, rather than the entire array.

For sorting very large sets of data that vastly exceed system memory, even the index may need to be sorted using an algorithm or combination of algorithms designed to perform reasonably with virtual memory, i.e., to reduce the amount of swapping required.

Conversely, some sorting algorithms can be derived by repeated application of a selection algorithm; quicksort and quickselect can be seen as the same pivoting move, differing only in whether one recurses on both sides (quicksort, divide-and-conquer) or one side (quickselect, decrease-and-conquer).