Distance correlation

In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension.

The population distance correlation coefficient is zero if and only if the random vectors are independent.

This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.

The classical measure of dependence, the Pearson correlation coefficient,[1] is mainly sensitive to a linear relationship between two variables.

Distance correlation was introduced in 2005 by Gábor J. Székely in several lectures to address this deficiency of Pearson's correlation, namely that it can easily be zero for dependent variables.

These quantities take the same roles as the ordinary moments with corresponding names in the specification of the Pearson product-moment correlation coefficient.

(In the matrices of centered distances (Aj, k) and (Bj,k) all rows and all columns sum to zero.)

The squared sample distance covariance (a scalar) is simply the arithmetic average of the products Aj, k Bj, k: The statistic Tn = n dCov2n(X, Y) determines a consistent multivariate test of independence of random vectors in arbitrary dimensions.

For an implementation see dcov.test function in the energy package for R.[4] The population value of distance covariance can be defined along the same lines.

Let X be a random variable that takes values in a p-dimensional Euclidean space with probability distribution μ and let Y be a random variable that takes values in a q-dimensional Euclidean space with probability distribution ν, and suppose that X and Y have finite expectations.

Write Finally, define the population value of squared distance covariance of X and Y as One can show that this is equivalent to the following definition: where E denotes expected value, and

denote independent and identically distributed (iid) copies of the variables

Alternatively, the distance covariance can be defined as the weighted L2 norm of the distance between the joint characteristic function of the random variables and the product of their marginal characteristic functions:[6] where

are the characteristic functions of (X, Y), X, and Y, respectively, p, q denote the Euclidean dimension of X and Y, and thus of s and t, and cp, cq are constants.

is chosen to produce a scale equivariant and rotation invariant measure that doesn't go to zero for dependent variables.

[6][7] One interpretation of the characteristic function definition is that the variables eisX and eitY are cyclic representations of X and Y with different periods given by s and t, and the expression ϕX, Y(s, t) − ϕX(s) ϕY(t) in the numerator of the characteristic function definition of distance covariance is simply the classical covariance of eisX and eitY.

The population value of distance variance is the square root of where

The sample distance variance is the square root of which is a relative of Corrado Gini's mean difference introduced in 1912 (but Gini did not work with centered distances).

For easy computation of sample distance correlation see the dcor function in the energy package for R.[4] This last property is the most important effect of working with centered distances.

[10] Equality holds in (iv) if and only if one of the random variables X or Y is a constant.

sample distance covariance can be defined as the nonnegative number for which One can extend

(in a possibly different metric space with finite first moment), define This is non-negative for all such

[11] The original distance covariance has been defined as the square root of

[10] Under these alternate definitions, the distance correlation is also defined as the square

The square of the covariance of random variables X and Y can be written in the following form: where E denotes the expected value and the prime denotes independent and identically distributed copies.

If U(s), V(t) are arbitrary random processes defined for all real s and t then define the U-centered version of X by whenever the subtracted conditional expected value exists and denote by YV the V-centered version of Y.

The most important example is when U and V are two-sided independent Brownian motions /Wiener processes with expectation zero and covariance |s| + |t| − |s − t| = 2 min(s,t) (for nonnegative s, t only).

(This is twice the covariance of the standard Wiener process; here the factor 2 simplifies the computations.)

On the other hand, if we replace the Brownian motion with the deterministic identity function id then Covid(X,Y) is simply the absolute value of the classical Pearson covariance, Other correlational metrics, including kernel-based correlational metrics (such as the Hilbert-Schmidt Independence Criterion or HSIC) can also detect linear and nonlinear interactions.

Both distance correlation and kernel-based metrics can be used in methods such as canonical correlation analysis and independent component analysis to yield stronger statistical power.