Geometric median

This generalizes the median, which has the property of minimizing the sum of distances or absolute differences for one-dimensional data.

It provides a measure of central tendency in higher dimensions and it is a standard problem in facility location, i.e., locating a facility to minimize the cost of transportation.

[3] The geometric median is an important estimator of location in statistics,[4] because it minimizes the sum of the L2 distances of the samples.

The more general k-median problem asks for the location of k cluster centers minimizing the sum of L2 distances from each sample point to its nearest center.

The special case of the problem for three points in the plane (that is, m = 3 and n = 2 in the definition below) is sometimes also known as Fermat's problem; it arises in the construction of minimal Steiner trees, and was originally posed as a problem by Pierre de Fermat and solved by Evangelista Torricelli.

See Fekete, Mitchell & Beurer (2005) for generalizations of the problem to non-discrete point sets.

, the geometric median is defined as the sum of the L2 distances minimizer Here, arg min means the value of the argument

Despite the geometric median's being an easy-to-understand concept, computing it poses a challenge.

The centroid or center of mass, defined similarly to the geometric median as minimizing the sum of the squares of the distances to each point, can be found by a simple formula — its coordinates are the averages of the coordinates of the points — but it has been shown that no explicit formula, nor an exact algorithm involving only arithmetic operations and kth roots, can exist in general for the geometric median.

Therefore, only numerical or symbolic approximations to the solution of this problem are possible under this model of computation.

Therefore, procedures that decrease the sum of distances at each step cannot get trapped in a local optimum.

One common approach of this type, called Weiszfeld's algorithm after the work of Endre Weiszfeld,[16] is a form of iteratively re-weighted least squares.

It can be modified to handle these cases so that it converges for all initial points.

[12] Bose, Maheshwari & Morin (2003) describe more sophisticated geometric optimization procedures for finding approximately optimal solutions to this problem.

Cohen et al. (2016) show how to compute the geometric median to arbitrary precision in nearly linear time.

Note also that the problem can be formulated as the second-order cone program which can be solved in polynomial time using common optimization solvers.

If y is distinct from all the given points, xi, then y is the geometric median if and only if it satisfies: This is equivalent to: which is closely related to Weiszfeld's algorithm.

In general, y is the geometric median if and only if there are vectors ui such that: where for xi ≠ y, and for xi = y, An equivalent formulation of this condition is It can be seen as a generalization of the median property, in the sense that any partition of the points, in particular as induced by any hyperplane through y, has the same and opposite sum of positive directions from y on each side.

The geometric median can be generalized from Euclidean spaces to general Riemannian manifolds (and even metric spaces) using the same idea which is used to define the Fréchet mean on a Riemannian manifold.

Example of geometric median (in yellow) of a series of points. In blue the Center of mass .