Euclidean distance matrix

The latter are easily analyzed using methods of linear algebra.

This allows to characterize Euclidean distance matrices and recover the points

In practical applications, distances are noisy measurements or come from arbitrary dissimilarity estimates (not necessarily metric).

The goal may be to visualize such data by points in Euclidean space whose distance matrix approximates a given dissimilarity matrix as well as possible — this is known as multidimensional scaling.

Alternatively, given two sets of data already represented by points in Euclidean space, one may ask how similar they are in shape, that is, how closely can they be related by a distance-preserving transformation — this is Procrustes analysis.

Some of the distances may also be missing or come unlabelled (as an unordered set or multiset instead of a matrix), leading to more complex algorithmic tasks, such as the graph realization problem or the turnpike problem (for points on a line).

[1][2] By the fact that Euclidean distance is a metric, the matrix A has the following properties.

In dimension k, a Euclidean distance matrix has rank less than or equal to k+2.

, that is, Gram matrices of some sequence of vectors (columns of

), are well understood — these are precisely positive semidefinite matrices.

Note that the Gram matrix contains additional information: distances from 0.

Theorem[4] (Schoenberg criterion,[5] independently shown by Young & Householder[6]) — A symmetric hollow n×n matrix A with real entries admits a realization in ℝk if and only if the (n-1)×(n-1) matrix

A more symmetric variant of the same theorem is the following: Corollary[8] — A symmetric hollow n×n matrix A with real entries admits a realization if and only if A is negative semidefinite on the hyperplane

In particular, these allow to show that a symmetric hollow n×n matrix is realizable in ℝk if and only if every (k+3)×(k+3) principal submatrix is.

[9] In practice, the definiteness or rank conditions may fail due to numerical errors, noise in measurements, or due to the data not coming from actual Euclidean distances.

Points that realize optimally similar distances can then be found by semidefinite approximation (and low rank approximation, if desired) using linear algebraic tools such as singular value decomposition or semidefinite programming.

Variants of these methods can also deal with incomplete distance data.

Unlabeled data, that is, a set or multiset of distances not assigned to particular pairs, is much more difficult to deal with.

Such data arises, for example, in DNA sequencing (specifically, genome recovery from partial digest) or phase retrieval.

Two sets of points are called homometric if they have the same multiset of distances (but are not necessarily related by a rigid transformation).

Deciding whether a given multiset of n(n-1)/2 distances can be realized in a given dimension k is strongly NP-hard.

In one dimension this is known as the turnpike problem; it is an open question whether it can be solved in polynomial time.

When the multiset of distances is given with error bars, even the one dimensional case is NP-hard.

Nevertheless, practical algorithms exist for many cases, e.g. random points.

[10][11][12] Given a Euclidean distance matrix, the sequence of points that realize it is unique up to rigid transformations – these are isometries of Euclidean space: rotations, reflections, translations, and their compositions.

be two sequences of points in k-dimensional Euclidean space ℝk.

Rigid transformations preserve distances so one direction is clear.

Q describes an orthogonal transformation of ℝk (a composition of rotations and reflections, without translations) which maps

In applications, when distances don't match exactly, Procrustes analysis aims to relate two point sets as close as possible via rigid transformations, usually using singular value decomposition.

The ordinary Euclidean case is known as the orthogonal Procrustes problem or Wahba's problem (when observations are weighted to account for varying uncertainties).