Because the binomial distribution is a discrete probability distribution (i.e., not continuous) and difficult to calculate for large numbers of trials, a variety of approximations are used to calculate this confidence interval, all with their own tradeoffs in accuracy and computational intensity.
A simple example of a binomial distribution is the set of various possible outcomes, and their probabilities, for the number of heads observed when a coin is flipped ten times.
[1] A commonly used formula for a binomial confidence interval relies on approximating the distribution of error about a binomially-distributed observation,
The equivalent formula in terms of observation counts is where the data are the results of
quantile of a standard normal distribution (i.e., the probit) corresponding to the target error rate
An important theoretical derivation of this confidence interval involves the inversion of a hypothesis test.
[5] Extending the normal approximation and Wald-Laplace interval concepts, Michael Short has shown that inequalities on the approximation error between the binomial distribution and the normal distribution can be used to accurately bracket the estimate of the confidence interval around
is again the (unknown) proportion of successes in a Bernoulli trial process (as opposed to
quantile of a standard normal distribution (i.e., the probit) corresponding to the target error rate
), the above inequalities give easily computed one- or two-sided intervals which bracket the exact binomial upper and lower confidence limits corresponding to the error rate
leading to the familiar formulas, showing that the calculation for weighted data is a direct generalization of them.
In practical tests of the formula's results, users find that this interval has good properties even for a small number of trials and / or the extremes of the probability estimate,
The Wilson interval can also be derived from the single sample z-test or Pearson's chi-squared test with two categories.
has the property of being guaranteed to obtain the same result as the equivalent z-test or chi-squared test.
This property can be visualised by plotting the probability density function for the Wilson score interval (see Wallis).
The following formulae for the lower and upper bounds of the Wilson score interval with continuity correction
Wallis (2021)[9] identifies a simpler method for computing continuity-corrected Wilson intervals that employs a special function based on Wilson's lower-bound formula: In Wallis' notation, for the lower bound, let where
In contrast, the Wilson interval has a systematic bias such that it is centred too close to
it is The beta distribution is, in turn, related to the F-distribution so a third formulation of the Clopper–Pearson interval can be written using F quantiles: where
In contrast, it is worth noting that other confidence interval may have coverage levels that are lower than the nominal
For instance, it can also be applied to the case where the samples are drawn without replacement from a population of a known size, instead of repeated draws of a binomial distribution.
is the quantile of a standard normal distribution, as before (for example, a 95% confidence interval requires
instead of 1.96 produces the "add 2 successes and 2 failures" interval previously described by Agresti & Coull.
[16] While it can stabilize the variance (and thus confidence intervals) of proportion data, its use has been criticized in several contexts.
The rule of three is used to provide a simple way of stating an approximate 95% confidence interval for
There are several research papers that compare these and other confidence intervals for the binomial proportion.
[3][2][20][21] Both Ross (2003)[22] and Agresti & Coull (1998)[13] point out that exact methods such as the Clopper–Pearson interval may not work as well as some approximations.
The normal approximation interval and its presentation in textbooks has been heavily criticised, with many statisticians advocating that it not be used.
[3] Of the approximations listed above, Wilson score interval methods (with or without continuity correction) have been shown to be the most accurate and the most robust,[3][4][2] though some prefer Agresti & Coulls' approach for larger sample sizes.
[4] Wilson and Clopper–Pearson methods obtain consistent results with source significance tests,[9] and this property is decisive for many researchers.