Bernoulli process

In probability and statistics, a Bernoulli process (named after Jacob Bernoulli) is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1.

The component Bernoulli variables Xi are identically distributed and independent.

One experiment with only two possible outcomes, often referred to as "success" and "failure", usually encoded as 1 and 0, can be modeled as a Bernoulli distribution.

The sets in this topology are finite sequences of coin flips, that is, finite-length strings of H and T (H stands for heads and T stands for tails), with the rest of (infinitely long) sequence taken as "don't care".

[1] Given a cylinder set, that is, a specific sequence of coin flip results

[2] Note that the probability of any specific, infinitely long sequence of coin flips is exactly zero; this is because

A probability equal to 1 implies that any given infinite sequence has measure zero.

Nevertheless, one can still say that some classes of infinite sequences of coin flips are far more likely than others, this is given by the asymptotic equipartition property.

To conclude the formal definition, a Bernoulli process is then given by the probability triple

, will approach the expected value almost certainly, that is, the events which do not satisfy this limit have zero probability.

In this case, one may make use of Stirling's approximation to the factorial, and write Inserting this into the expression for P(k,n), one obtains the Normal distribution; this is the content of the central limit theorem, and this is the simplest example thereof.

The combination of the law of large numbers, together with the central limit theorem, leads to an interesting and perhaps surprising result: the asymptotic equipartition property.

Put informally, one notes that, yes, over many coin flips, one will observe H exactly p fraction of the time, and that this corresponds exactly with the peak of the Gaussian.

The size of this set is interesting, also, and can be explicitly determined: the logarithm of it is exactly the entropy of the Bernoulli process.

By using Stirling's approximation, putting it into the expression for P(k,n), solving for the location and width of the peak, and finally taking

The question long defied analysis, but was finally and completely answered with the Ornstein isomorphism theorem.

However, such processes do not consist of independent random variables: indeed, many purely deterministic, non-random systems can be mixing).

One way to create a dynamical system out of the Bernoulli process is as a shift space.

given by the shift operator The Bernoulli measure, defined above, is translation-invariant; that is, given any cylinder set

This map is called the dyadic transformation; for the doubly-infinite sequence of bits

Indeed, the Bernoulli polynomials obey the identity Note that the sum gives the Cantor function, as conventionally defined.

Suppose a Bernoulli process formally defined as a single random variable (see preceding section).

[verification needed] From any Bernoulli process one may derive a Bernoulli process with p = 1/2 by the von Neumann extractor, the earliest randomness extractor, which actually extracts uniform randomness.

On average the computation discards proportion p2 + (1 − p)2 of the input pairs(00 and 11), which is near one when p is near zero or one, and is minimized at 1/4 when p = 1/2 for the original process (in which case the output stream is 1/4 the length of the input stream on average).

Von Neumann (classical) main operation pseudocode: This decrease in efficiency, or waste of randomness present in the input stream, can be mitigated by iterating the algorithm over the input data.

This way the output can be made to be "arbitrarily close to the entropy bound".

[5] The iterated version of the von Neumann algorithm, also known as advanced multi-level strategy (AMLS),[6] was introduced by Yuval Peres in 1992.

[5] It works recursively, recycling "wasted randomness" from two sources: the sequence of discard-non-discard, and the values of discarded pairs (0 for 00, and 1 for 11).

While such generation of additional sequences can be iterated infinitely to extract all available entropy, an infinite amount of computational resources is required, therefore the number of iterations is typically fixed to a low value – this value either fixed in advance, or calculated at runtime.

Von Neumann–Peres (iterated) main operation pseudocode: Another tweak was presented in 2016, based on the observation that the Sequence2 channel doesn't provide much throughput, and a hardware implementation with a finite number of levels can benefit from discarding it earlier in exchange for processing more levels of Sequence1.