Bessel's correction

This method corrects the bias in the estimation of the population variance.

It also partially corrects the bias in the estimation of the population standard deviation.

However, the correction often increases the mean squared error in these estimations.

For a more intuitive explanation of the need for Bessel's correction, see § Source of bias.

Generally Bessel's correction is an approach to reduce the bias due to finite sample size.

Such finite-sample bias correction is also needed for other estimates like skew and kurtosis, but in these the inaccuracies are often significantly larger.

To fully remove such bias it is necessary to do a more complex multi-parameter estimation.

For instance a correct correction for the standard deviation depends on the kurtosis (normalized central 4th moment), but this again has a finite sample bias and it depends on the standard deviation, i.e., both estimations have to be merged.

There are three caveats to consider regarding Bessel's correction: Firstly, while the sample variance (using Bessel's correction) is an unbiased estimator of the population variance, its square root, the sample standard deviation, is a biased estimate of the population standard deviation; because the square root is a concave function, the bias is downward, by Jensen's inequality.

There is no general formula for an unbiased estimator of the population standard deviation, though there are correction factors for particular distributions, such as the normal; see unbiased estimation of standard deviation for details.

An approximation for the exact correction factor for the normal distribution is given by using n − 1.5 in the formula: the bias decays quadratically (rather than linearly, as in the uncorrected form and Bessel's corrected form).

Secondly, the unbiased estimator does not minimize mean squared error (MSE), and generally has worse MSE than the uncorrected estimator (this varies with excess kurtosis).

The optimal value depends on excess kurtosis, as discussed in mean squared error: variance; for the normal distribution this is optimized by dividing by n + 1 (instead of n − 1 or n).

In that case there are n degrees of freedom in a sample of n points, and simultaneous estimation of mean and variance means one degree of freedom goes to the sample mean and the remaining n − 1 degrees of freedom (the residuals) go to the sample variance.

Most simply, to understand the bias that needs correcting, think of an extreme case.

In the case of n = 1, the variance just cannot be estimated, because there is no variability in the sample.

Suppose the mean of the whole population is 2050, but the statistician does not know that, and must estimate it based on this small sample chosen randomly from the population: One may compute the sample average: This may serve as an observable estimate of the unobservable population average, which is 2050.

Now we face the problem of estimating the population variance.

A variance calculation using any other average value must produce a larger result.

Now, we apply this identity to the squares of deviations from the population mean: Now apply this to all five observations and observe certain patterns: The sum of the entries in the middle column must be zero because the term a will be added across all 5 rows, which itself must equal zero.

That is because a contains the 5 individual samples (left side within parentheses) which – when added – naturally have the same sum as adding 5 times the sample mean of those 5 numbers (2052).

The factor 2 and the term b in the middle column are equal for all rows, meaning that the relative difference across all rows in the middle column stays the same and can therefore be disregarded.

The following statements explain the meaning of the remaining columns: Therefore: That is why the sum of squares of the deviations from the sample mean is too small to give an unbiased estimate of the population variance when the average of those squares is found.

However caution is needed: some calculators and software packages may provide for both or only the more unusual formulation.

This article uses the following symbols and definitions: The standard deviations will then be the square roots of the respective variances.

Since the square root introduces bias, the terminology "uncorrected" and "corrected" is preferred for the standard deviation estimators: The sample mean is given by

are independent and identically distributed random variables with expectation

of the underlying sample space, we would like to get a good estimate for the variance

This means that on average, this formula should produce the right answer.

But let us calculate the expected value of this expression: here we have (by independence, symmetric cancellation and identical distributions) and therefore In contrast, Therefore, our initial guess was wrong by a factor of and this is precisely Bessel's correction.