Law of the unconscious statistician

In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem which expresses the expected value of a function g(X) of a random variable X in terms of g and the probability distribution of X.

The form of the law depends on the type of random variable X in question.

If the distribution of X is discrete and one knows its probability mass function pX, then the expected value of g(X) is

If instead the distribution of X is continuous with probability density function fX, then the expected value of g(X) is

Both of these special cases can be expressed in terms of the cumulative probability distribution function FX of X, with the expected value of g(X) now given by the Lebesgue–Stieltjes integral

In even greater generality, X could be a random element in any measurable space, in which case the law is given in terms of measure theory and the Lebesgue integral.

In this setting, there is no need to restrict the context to probability measures, and the law becomes a general theorem of mathematical analysis on Lebesgue integration relative to a pushforward measure.

This proposition is (sometimes) known as the law of the unconscious statistician because of a purported tendency to think of the aforementioned law as the very definition of the expected value of a function g(X) and a random variable X, rather than (more formally) as a consequence of the true definition of expected value.

[1] The naming is sometimes attributed to Sheldon Ross' textbook Introduction to Probability Models, although he removed the reference in later editions.

[2] Many statistics textbooks do present the result as the definition of expected value.

[3] A similar property holds for joint distributions, or equivalently, for random vectors.

In the simplest case, where the random variable X takes on countably many values (so that its distribution is discrete), the proof is particularly simple, and holds without modification if X is a discrete random vector or even a discrete random element.

The case of a continuous random variable is more subtle, since the proof in generality requires subtle forms of the change-of-variables formula for integration.

However, in the framework of measure theory, the discrete case generalizes straightforwardly to general (not necessarily discrete) random elements, and the case of a continuous random variable is then a special case by making use of the Radon–Nikodym theorem.

, and for each i let Ii denote the collection of all j with g(xj) = yi.

However, if X takes on countably many values, the last equality given does not always hold, as seen by the Riemann series theorem.

Because of this, it is necessary to assume the absolute convergence of the sums in question.

[5] Suppose that X is a random variable whose distribution has a continuous density f. If g is a general function, then the probability that g(X) is valued in a set of real numbers K equals the probability that X is valued in g−1(K), which is given by

In the simplest case, if g is differentiable with nowhere-vanishing derivative, then the above integral can be written as

This shows that the expected value of g(X) is encoded entirely by the function g and the density f of X.

[6] The assumption that g is differentiable with nonvanishing derivative, which is necessary for applying the usual change-of-variables formula, excludes many typical cases, such as g(x) = x2.

The result still holds true in these broader settings, although the proof requires more sophisticated results from mathematical analysis such as Sard's theorem and the coarea formula.

In even greater generality, using the Lebesgue theory as below, it can be found that the identity

holds true whenever X has a density f (which does not have to be continuous) and whenever g is a measurable function for which g(X) has finite expected value.

Furthermore, without modification to the proof, this holds even if X is a random vector (with density) and g is a multivariable function; the integral is then taken over the multi-dimensional range of values of X.

An abstract and general form of the result is available using the framework of measure theory and the Lebesgue integral.

In fact, the discrete case (although without the restriction to probability measures) is the first step in proving the general measure-theoretic formulation, as the general version follows therefrom by an application of the monotone convergence theorem.

[7] Without any major changes, the result can also be formulated in the setting of outer measures.

[8] If μ is a σ-finite measure, the theory of the Radon–Nikodym derivative is applicable.

In the further special case that Ω′ is the real number line, as in the contexts discussed above, it is natural to take ν to be the Lebesgue measure, and this then recovers the 'continuous case' given above whenever μ is a probability measure.