Standard error

[1] The standard error is a key ingredient in producing confidence intervals.

In regression analysis, the term "standard error" refers either to the square root of the reduced chi-squared statistic or the standard error for a particular regression coefficient (as used in, say, confidence intervals).

Practically this tells us that when trying to estimate the value of a population mean, due to the factor

, reducing the error on the estimate by a factor of two requires acquiring four times as many observations in the sample; reducing it by a factor of ten requires a hundred times as many observations.

As this is only an estimator for the true "standard error", it is common to see other notations here such as:

Gurland and Tripathi (1971) provide a correction and equation for this effect.

[4] Sokal and Rohlf (1981) give an equation of the correction factor for small samples of n < 20.

(the mean and standard deviation for the population), then we can define the total

For correlated random variables, the sample variance needs to be computed according to the Markov chain central limit theorem.

There are cases when a sample is taken without knowing, in advance, how many observations will be acceptable according to some criterion.

In many practical applications, the true value of σ is unknown.

As a result, we need to use a distribution that takes into account that spread of possible σ's.

T-distributions are slightly different from Gaussian, and vary depending on the size of the sample.

Small samples are somewhat more likely to underestimate the population standard deviation and have a mean that differs from the true population mean, and the Student t-distribution accounts for the probability of these events with somewhat heavier tails compared to a Gaussian.

To estimate the standard error of a Student t-distribution it is sufficient to use the sample standard deviation "s" instead of σ, and we could use this value to calculate confidence intervals.

If these conditions are not met, then using a Bootstrap distribution to estimate the Standard Error is often a good workaround, but it can be computationally intensive.

is used is to make confidence intervals of the unknown population mean is shown.

The following expressions can be used to calculate the upper and lower 95% confidence limits, where

Standard errors provide simple measures of uncertainty in a value and are often used because: In scientific and technical literature, experimental data are often summarized either using the mean and standard deviation of the sample data or the mean with the standard error.

However, the mean and standard deviation are descriptive statistics, whereas the standard error of the mean is descriptive of the random sampling process.

The standard deviation of the sample data is a description of the variation in measurements, while the standard error of the mean is a probabilistic statement about how the sample size will provide a better bound on estimates of the population mean, in light of the central limit theorem.

[9] If the population standard deviation is finite, the standard error of the mean of the sample will tend to zero with increasing sample size, because the estimate of the population mean will improve, while the standard deviation of the sample will tend to approximate the population standard deviation as the sample size increases.

The formula given above for the standard error assumes that the population is infinite.

Nonetheless, it is often used for finite populations when people are interested in measuring the process that created the existing finite population (this is called an analytic study).

If one is interested in measuring an existing finite population that will not change over time, then it is necessary to adjust for the population size (called an enumerative study).

When the sampling fraction (often termed f) is large (approximately at 5% or more) in an enumerative study, the estimate of the standard error must be corrected by multiplying by a ''finite population correction'' (a.k.a.

to account for the added precision gained by sampling close to a larger percentage of the population.

If values of the measured quantity A are not statistically independent but have been obtained from known locations in parameter space x, an unbiased estimate of the true standard error of the mean (actually a correction on the standard deviation part) may be obtained by multiplying the calculated standard error of the sample by the factor f:

This approximate formula is for moderate to large sample sizes; the reference gives the exact formulas for any sample size, and can be applied to heavily autocorrelated time series like Wall Street stock quotes.

Moreover, this formula works for positive and negative ρ alike.

For a value that is sampled with an unbiased normally distributed error, the above depicts the proportion of samples that would fall between 0, 1, 2, and 3 standard deviations above and below the actual value.
Expected error in the mean of A for a sample of n data points with sample bias coefficient ρ . The unbiased standard error plots as the ρ = 0 diagonal line with log-log slope − 1 2 .