Poisson binomial distribution

In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed.

The concept is named after Siméon Denis Poisson.

In other words, it is the probability distribution of the number of successes in a collection of n independent yes/no experiments with success probabilities

The ordinary binomial distribution is a special case of the Poisson binomial distribution, when all success probabilities are the same, that is

The probability of having k successful trials out of a total of n can be written as the sum [1] where

is the set of all subsets of k integers that can be selected from

elements, the sum over which is infeasible to compute in practice unless the number of trials n is small (e.g. if n = 30,

As long as none of the success probabilities are equal to one, one can calculate the probability of k successes using the recursive formula [2] [3] where The recursive formula is not numerically stable, and should be avoided if

An alternative is to use a divide-and-conquer algorithm: if we assume

More generally, the probability mass function of a Poisson binomial can be expressed as the convolution of the vectors

This observation leads to the Direct Convolution (DC) algorithm for computing

DC is numerically stable, exact, and, when implemented as a software routine, exceptionally fast for

[4] Another possibility is using the discrete Fourier transform.

Still other methods are described in "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions" by Chen and Liu[6] and in "A simple and fast method for computing the Poisson binomial distribution function" by Biscarri et al.[4] The cumulative distribution function (CDF) can be expressed as:

is the set of all subsets of size

It can be computed by invoking the DC function above, and then adding elements

of the returned PMF array.

Since a Poisson binomial distributed variable is a sum of n independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the n Bernoulli distributions: There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy is bounded above by the entropy of a binomial distribution with the same number parameter and the same mean.

[7] The Shepp–Olkin concavity conjecture, due to Lawrence Shepp and Ingram Olkin in 1981, states that the entropy of a Poisson binomial distribution is a concave function of the success probabilities

[8] This conjecture was proved by Erwan Hillion and Oliver Johnson in 2015.

This conjecture was also proved by Hillion and Johnson, in 2019.

[10] The probability that a Poisson binomial distribution gets large, can be bounded using its moment generating function as follows (valid when

This is similar to the tail bounds of a binomial distribution.

tend to homogeneity, the larger

[1] Ehm has determined bounds for the total variation distance of

, in effect providing bounds on the error introduced when approximating

be the total variation distance of

Barbour and Hall have shown that

; so a Poisson binomial distribution's variance is bounded above by a Poisson distribution with

The reference [13] discusses techniques of evaluating the probability mass function of the Poisson binomial distribution.