Binomial distribution

However, for N much larger than n, the binomial distribution remains a good approximation, and is widely used.

Since the trials are independent with probabilities remaining constant between them, any sequence of n trials with k successes (and n − k failures) has the same probability of being achieved (regardless of positions of successes within the sequence).

counts the number of ways to choose the positions of the k successes among the n trials.

This is because for k > n/2, the probability can be calculated by its complement as Looking at the expression f(k, n, p) as a function of k, there is a k value that maximizes it.

[note 1] Suppose a biased coin comes up heads with probability 0.3 when tossed.

The probability of seeing exactly 4 heads in 6 tosses is The cumulative distribution function can be expressed as: where

is the "floor" under k, i.e. the greatest integer less than or equal to k. It can also be represented in terms of the regularized incomplete beta function, as follows:[3] which is equivalent to the cumulative distribution functions of the beta distribution and of the F-distribution:[4] Some closed-form bounds for the cumulative distribution function are given below.

If X ~ B(n, p), that is, X is a binomially distributed random variable, n being the total number of experiments and p the probability of each experiment yielding a successful result, then the expected value of X is:[5] This follows from the linearity of the expected value along with the fact that X is the sum of n identical Bernoulli random variables, each with expected value p. In other words, if

[9] In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique.

However, several special results have been established: For k ≤ np, upper bounds can be derived for the lower tail of the cumulative distribution function

This estimator is unbiased and uniformly with minimum variance, proven using Lehmann–Scheffé theorem, since it is based on a minimal sufficient and complete statistic (i.e.: x).

This statistic is asymptotically normal thanks to the central limit theorem, because it is the same as taking the mean over Bernoulli samples.

as a prior, the posterior mean estimator is: The Bayes estimator is asymptotically efficient and as the sample size approaches infinity (n → ∞), it approaches the MLE solution.

[18] The Bayes estimator is biased (how much depends on the priors), admissible and consistent in probability.

For the special case of using the standard uniform distribution as a non-informative prior,

This method is called the rule of succession, which was introduced in the 18th century by Pierre-Simon Laplace.

, leading to: Another method is to use the upper bound of the confidence interval obtained using the rule of three: Even for quite large values of n, the actual distribution of the mean is significantly nonnormal.

In the equations for confidence intervals below, the variables have the following meaning: A continuity correction of 0.5/n may be added.

[21] (Exact does not mean perfectly accurate; rather, it indicates that the estimates will not be less conservative than the true value.)

[32] Various rules of thumb may be used to decide whether n is large enough, and p is far enough from the extremes of zero or one: This can be made precise using the Berry–Esseen theorem.

On the other hand, apply again the square root and divide by 3, Subtracting the second set of inequalities from the first one yields: and so, the desired first rule is satisfied, Assume that both values

, to deduce the alternative form of the 3-standard-deviation rule: The following is an example of applying a continuity correction.

Suppose one wishes to calculate Pr(X ≤ 8) for a binomial random variable X.

The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when undertaking calculations by hand (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738.

Nowadays, it can be seen as a consequence of the central limit theorem since B(n, p) is a sum of n independent, identically distributed Bernoulli variables with parameter p. This fact is the basis of a hypothesis test, a "proportion z-test", for the value of p using x/n, the sample proportion and estimator of p, in a common test statistic.

[35] For example, suppose one randomly samples n people out of a large population and ask them whether they agree with a certain statement.

If groups of n people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion p of agreement in the population and with standard deviation The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product np converges to a finite limit.

[43][44] One way to generate random variates samples from a binomial distribution is to use an inversion algorithm.

Then by using a pseudorandom number generator to generate samples uniformly between 0 and 1, one can transform the calculated samples into discrete numbers by using the probabilities calculated in the first step.

Binomial distribution for p = 0.5
with n and k as in Pascal's triangle

The probability that a ball in a Galton box with 8 layers ( n = 8 ) ends up in the central bin ( k = 4 ) is 70/256 .
Binomial probability mass function and normal probability density function approximation for n = 6 and p = 0.5