If the data set has an odd number of observations, the middle one is selected (after arranging in ascending order).
The median is a special case of other ways of summarizing the typical values associated with a statistical distribution: it is the 2nd quartile, 5th decile, and 50th percentile.
For example, consider the multiset The median is 2 in this case, as is the mode, and it might be seen as a better indication of the center than the arithmetic mean of 4, which is larger than all but one of the values.
However, the widely cited empirical relationship that the mean is shifted "further into the tail" of a distribution than the median is not generally true.
[5] As a median is based on the middle data in a set, it is not necessary to know the value of extreme results in order to calculate it.
For practical purposes, different measures of location and dispersion are often compared on the basis of how well the corresponding population values can be estimated from a sample of data.
This bound was proved by Book and Sher in 1979 for discrete samples,[13] and more generally by Page and Murty in 1982.
Mallows's proof can be generalized to obtain a multivariate version of the inequality[17] simply by replacing the absolute value with a norm: where m is a spatial median, that is, a minimizer of the function
When the distribution has a monotonically decreasing probability density, then the median is less than the mean, as shown in the figure.
A more robust estimator is Tukey's ninther, which is the median of three rule applied with limited recursion:[26] if A is the sample laid out as an array, and then The remedian is an estimator for the median that requires linear time but sub-linear memory, operating in a single pass over the sample.
Laplace's result is now understood as a special case of the asymptotic distribution of arbitrary quantiles.
For a continuous variable, the probability of multiple sample values being exactly equal to the median is 0, so one can calculate the density of at the point
allows the last expression to be written as Hence the density function of the median is a symmetric beta distribution pushed forward by
By the chain rule, the corresponding variance of the sample median is The additional 2 is negligible in the limit.
Using these preliminaries, it is possible to investigate the effect of sample size on the standard errors of the mean and median.
[43] When dealing with a discrete variable, it is sometimes useful to regard the observed values as being midpoints of underlying continuous intervals.
[45] The Theil–Sen estimator is a method for robust linear regression based on finding medians of slopes.
The idea dates back to Wald in 1940 who suggested dividing a set of bivariate data into two halves depending on the value of the independent parameter
Nair and Shrivastava in 1942 suggested a similar idea but instead advocated dividing the sample into three equal parts before calculating the means of the subsamples.
A median-unbiased estimator minimizes the risk with respect to the absolute-deviation loss function, as observed by Laplace.
[58] Scientific researchers in the ancient near east appear not to have used summary statistics altogether, instead choosing values that offered maximal consistency with a broader theory that integrated a wide variety of phenomena.
[59] Within the Mediterranean (and, later, European) scholarly community, statistics like the mean are fundamentally a medieval and early modern development.
The idea of the median appeared in the 6th century in the Talmud, in order to fairly analyze divergent appraisals.
Instead, the closest ancestor of the modern median is the mid-range, invented by Al-Biruni[62]: 31 [63] Transmission of his work to later scholars is unclear.
Whether rediscovered or independently invented, the mid-range is recommended to nautical navigators in Harriot's "Instructions for Raleigh's Voyage to Guiana, 1595".
[65] Wright was reluctant to discard measured values, and may have felt that the median — incorporating a greater proportion of the dataset than the mid-range — was more likely to be correct.
However, Wright did not give examples of his technique's use, making it hard to verify that he described the modern notion of median.
[59][63][b] The median (in the context of probability) certainly appeared in the correspondence of Christiaan Huygens, but as an example of a statistic that was inappropriate for actuarial practice.
[59][66] In 1774, Laplace made this desire explicit: he suggested the median be used as the standard estimator of the value of a posterior PDF.
[68] Antoine Augustin Cournot in 1843 was the first[69] to use the term median (valeur médiane) for the value that divides a probability distribution into two equal halves.