Energy distance satisfies all axioms of a metric thus energy distance characterizes the equality of distributions: D(F,G) = 0 if and only if F = G. Energy distance for statistical applications was introduced in 1985 by Gábor J. Székely, who proved that for real-valued random variables
is exactly twice Harald Cramér's distance:[1] For a simple proof of this equivalence, see Székely (2002).
(Notice that Cramér's distance is not the same as the distribution-free Cramér–von Mises criterion.)
One can generalize the notion of energy distance to probability distributions on metric spaces.
be a metric space with its Borel sigma algebra
of μ and ν can be defined as the square root of This is not necessarily non-negative, however.
In this situation, the energy distance is zero if and only if X and Y are identically distributed.
[4] In the literature on kernel methods for machine learning, these generalized notions of energy distance are studied under the name of maximum mean discrepancy.
Equivalence of distance based and kernel methods for hypothesis testing is covered by several authors.
[5][6] A related statistical concept, the notion of E-statistic or energy-statistic[7] was introduced by Gábor J. Székely in the 1980s when he was giving colloquium lectures in Budapest, Hungary and at MIT, Yale, and Columbia.
This concept is based on the notion of Newton’s potential energy.
Energy distance and E-statistic were considered as N-distances and N-statistic in Zinger A.A., Kakosyan A.V., Klebanov L.B.
Characterization of distributions by means of mean values of some statistics in connection with some probability metrics, Stability Problems for Stochastic Models.
(in Russian), English Translation: A characterization of distributions by mean values of statistics and certain probabilistic metrics A.
A. Zinger, A. V. Kakosyan, L. B. Klebanov in Journal of Soviet Mathematics (1992).
The book[3] gives these results and their applications to statistical testing as well.
Consider the null hypothesis that two random variables, X and Y, have the same probability distributions:
For statistical samples from X and Y: the following arithmetic averages of distances are computed between the X and the Y samples: The E-statistic of the underlying null hypothesis is defined as follows: One can prove[8][9] that
Under this null hypothesis the test statistic converges in distribution to a quadratic form of independent standard normal random variables.
A multivariate goodness-of-fit measure is defined for distributions in arbitrary dimension (not restricted by sample size).
, and the asymptotic distribution of Qn is a quadratic form of centered Gaussian random variables.
Under an alternative hypothesis, Qn tends to infinity stochastically, and thus determines a statistically consistent test.
For most applications the exponent 1 (Euclidean distance) can be applied.
The important special case of testing multivariate normality[9] is implemented in the energy package for R. Tests are also developed for heavy tailed distributions such as Pareto (power law), or stable distributions by application of exponents in (0,1).