Completeness (statistics)

In statistics, completeness is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset.

It is closely related to the concept of a sufficient statistic which contains all of the information that the dataset provides about the parameters.

[1] Consider a random variable X whose probability distribution belongs to a parametric model Pθ parametrized by θ.

Say T is a statistic; that is, the composition of a measurable function with a random sample X1,...,Xn.

The statistic T is said to be complete for the distribution of X if, for every measurable function g,[1] The statistic T is said to be boundedly complete for the distribution of X if this implication holds for every measurable function g that is also bounded.

The Bernoulli model admits a complete statistic.

if and only if: On denoting p/(1 − p) by r, one gets: First, observe that the range of r is the positive reals.

Also, E(g(T)) is a polynomial in r and, therefore, can only be identical to 0 if all coefficients are 0, that is, g(t) = 0 for all t. It is important to notice that the result that all coefficients must be 0 was obtained because of the range of r. Had the parameter space been finite and with a number of elements less than or equal to n, it might be possible to solve the linear equations in g(t) obtained by substituting the values of r and get solutions different from 0.

This example will show that, in a sample X1, X2 of size 2 from a normal distribution with known variance, the statistic X1 + X2 is complete and sufficient.

Suppose X1, X2 are independent, identically distributed random variables, normally distributed with expectation θ and variance 1.

To show this, it is sufficient to demonstrate that there is no non-zero function

The probability distribution of X1 + X2 is normal with expectation 2θ and variance 2.

is therefore proportional to The expectation of g above would therefore be a constant times A bit of algebra reduces this to where k(θ) is nowhere zero and As a function of θ this is a two-sided Laplace transform of h, and cannot be identically zero unless h is zero almost everywhere.

Most parametric models have a sufficient statistic which is not complete.

Galili and Meilijson 2016 [3] propose the following didactic example.

This model is a scale family (a specific case of a location-scale family) model: scaling the samples by a multiplier

Galili and Meilijson show that the minimum and maximum of the samples are together a sufficient statistic:

Indeed, conditional on these two values, the distribution of the rest of the sample is simply uniform on the range they define:

from that distribution, we obtain: We have thus shown that there exists a function

Completeness occurs in the Lehmann–Scheffé theorem,[1] which states that if a statistic that is unbiased, complete and sufficient for some parameter θ, then it is the best mean-unbiased estimator for θ.

In other words, this statistic has a smaller expected loss for any convex loss function; in many practical applications with the squared loss-function, it has a smaller mean squared error among any estimators with the same expected value.

Examples exists that when the minimal sufficient statistic is not complete then several alternative statistics exist for unbiased estimation of θ, while some of them have lower variance than others.

Bounded completeness occurs in Basu's theorem,[1] which states that a statistic that is both boundedly complete and sufficient is independent of any ancillary statistic.

Bounded completeness also occurs in Bahadur's theorem.