Median of medians

In computer science, the median of medians is an approximate median selection algorithm, frequently used to supply a good pivot for an exact selection algorithm, most commonly quickselect, that selects the kth smallest element of an initially unsorted array.

Using this approximate median as an improved pivot, the worst-case complexity of quickselect reduces from quadratic to linear, which is also the asymptotically optimal worst-case complexity of any selection algorithm.

In other words, the median of medians is an approximate median-selection algorithm that helps building an asymptotically optimal, exact general selection algorithm (especially in the sense of worst-case complexity), by producing good pivot elements.

[citation needed] Although this approach optimizes the asymptotic worst-case complexity quite well, it is typically outperformed in practice by instead choosing random pivots for its average

This again ensures a worst-case linear performance, in addition to average-case linear performance: introselect starts with quickselect (with random pivot, default), to obtain good average performance, and then falls back to modified quickselect with pivot obtained from median of medians if the progress is too slow.

Even though asymptotically similar, such a hybrid algorithm will have a lower complexity than a straightforward introselect up to a constant factor (both in average-case and worst-case), at any finite length.

The algorithm was published in Blum et al. (1973), and thus is sometimes called BFPRT after the last names of the authors.

Quickselect is linear-time on average, but it can require quadratic time with poor pivot choices.

If the search set decreases exponentially quickly in size (by a fixed proportion), this yields a geometric series times the

However, if the search set decreases slowly in size, such as linearly (by a fixed number of elements, in the worst case only reducing by one element each time), then a linear sum of linear steps yields quadratic overall time (formally, triangular numbers grow quadratically).

If one instead consistently chooses "good" pivots, this is avoided and one always gets linear performance even in the worst case.

A "good" pivot is one for which we can establish that a constant proportion of elements fall both below and above it, as then the search set decreases at least by a constant proportion at each step, hence exponentially quickly, and the overall time remains linear.

The median is a good pivot – the best for sorting, and the best overall choice for selection – decreasing the search set by half at each step.

The median-of-medians algorithm computes an approximate median, namely a point that is guaranteed to be between the 30th and 70th percentiles (in the middle 4 deciles).

The problem is reduced to 70% of the original size, which is a fixed proportion smaller.

Applying the same algorithm on the now smaller set recursively until only one or two elements remain results in a cost of

As stated before, median-of-medians is used as a pivot selection strategy in the quickselect algorithm, which in pseudocode looks as follows.

The following pseudocode assumes that left, right, and the list use one-based numbering and that select is initially called with 1 as the argument to left and the length of the list as the argument to right.

[1] Note that pivot calls select; this is an instance of mutual recursion.

There is a subroutine called partition that can, in linear time, group a list (ranging from indices left to right) into three parts, those less than a certain element, those equal to it, and those greater than the element (a three-way partition).

The grouping into three parts ensures that the median-of-medians maintains linear execution time in a case of many or all coincident elements.

Here is pseudocode that performs a partition about the element list[pivotIndex]: The partition5 subroutine selects the median of a group of at most five elements; an easy way to implement this is insertion sort, as shown below.

Thus the chosen median splits the ordered elements somewhere between 30%/70% and 70%/30%, which assures worst-case linear behavior of the algorithm.

The median-calculating recursive call does not exceed worst-case linear behavior because the list of medians has size

be the time it takes to run a median-of-medians Quickselect algorithm on an array of size

Then we know this time is: where By induction: The key step is reducing the problem to selecting in two lists whose total length is shorter than the original list, plus a linear factor for the reduction step.

Firstly, computing median of an odd list is faster and simpler; while one could use an even list, this requires taking the average of the two middle elements, which is slower than simply selecting the single exact middle element.

With groups of only three elements, the resulting list of medians to search in is length

Finding the median of a larger group takes longer, but is a constant factor (depending only on

In fact, considering the number of comparisons in the worst case, the constant factor is