Such a sequence might represent an attempt to construct 'better and better' approximations to a desired measure μ that is difficult to obtain directly.
The meaning of 'better and better' is subject to all the usual caveats for taking limits; for any error tolerance ε > 0 we require there be N sufficiently large for n ≥ N to ensure the 'difference' between μn and μ is smaller than ε.
This section attempts to provide a rough intuitive description of three notions of convergence, using terminology developed in calculus courses; this section is necessarily imprecise as well as inexact, and the reader should refer to the formal clarifications in subsequent sections.
In particular, the descriptions here do not address the possibility that the measure of some sets could be infinite, or that the underlying space could exhibit pathological behavior, and additional technical assumptions are needed for some of the statements.
The statements in this section are however all correct if μn is a sequence of probability measures on a Polish space.
To formalize this requires a careful specification of the set of functions under consideration and how uniform the convergence should be.
Intuitively, considering integrals of 'nice' functions, this notion provides more uniformity than weak convergence.
As a matter of fact, when considering sequences of measures with uniformly bounded variation on a Polish space, setwise convergence implies the convergence
As before, this implies convergence of integrals against bounded measurable functions, but this time convergence is uniform over all functions bounded by any fixed constant.
This is the strongest notion of convergence shown on this page and is defined as follows.
This is in contrast, for example, to the Wasserstein metric, where the definition is of the same form, but the supremum is taken over f ranging over the set of measurable functions from X to [−1, 1] which have Lipschitz constant at most 1; and also in contrast to the Radon metric, where the supremum is taken over f ranging over the set of continuous functions from X to [−1, 1].
If μ and ν are both probability measures, then the total variation distance is also given by The equivalence between these two definitions can be seen as a particular case of the Monge–Kantorovich duality.
From the two definitions above, it is clear that the total variation distance between probability measures is always between 0 and 2.
To illustrate the meaning of the total variation distance, consider the following thought experiment.
Assume that we are given two probability measures μ and ν, as well as a random variable X.
Assume that these two measures have prior probabilities 0.5 each of being the true law of X.
The quantity then provides a sharp upper bound on the prior probability that our guess will be correct.
a measurable space, a sequence μn is said to converge setwise to a limit μ if for every set
For example, as a consequence of the Riemann–Lebesgue lemma, the sequence μn of measures on the interval [−1, 1] given by μn(dx) = (1 + sin(nx))dx converges setwise to Lebesgue measure, but it does not converge in total variation.
It depends on a topology on the underlying space and thus is not a purely measure-theoretic notion.
There are several equivalent definitions of weak convergence of a sequence of measures, some of which are (apparently) more general than others.
denote the cumulative distribution functions of the measures
The weak topology is generated by the following basis of open sets: where If
as the (closed) set of Dirac measures, and its convex hull is dense.
There are many "arrow notations" for this kind of convergence: the most frequently used are
If Xn: Ω → X is a sequence of random variables then Xn is said to converge weakly (or in distribution or in law) to the random variable X: Ω → X as n → ∞ if the sequence of pushforward measures (Xn)∗(P) converges weakly to X∗(P) in the sense of weak convergence of measures on X, as defined above.
The following spaces of test functions are commonly used in the convergence of probability measures.
is not specified to be a probability measure is not guaranteed to imply weak convergence.
The definitions of weak and weak-* convergences used in functional analysis are as follows: Let
of Radon measures is isomorphic to a subspace of the space of continuous linear functionals on