Ward's method

In statistics, Ward's method is a criterion applied in hierarchical cluster analysis.

Ward's minimum variance method is a special case of the objective function approach originally presented by Joe H. Ward, Jr.[1] Ward suggested a general agglomerative hierarchical clustering procedure, where the criterion for choosing the pair of clusters to merge at each step is based on the optimal value of an objective function.

Many of the standard clustering procedures are contained in this very general class.

The nearest-neighbor chain algorithm can be used to find the same clustering defined by Ward's method, in time proportional to the size of the input distance matrix and space linear in the number of points being clustered.

To implement this method, at each step find the pair of clusters that leads to minimum increase in total within-cluster variance after merging.

This increase is a weighted squared distance between cluster centers.

To apply a recursive algorithm under this objective function, the initial distance between individual objects must be (proportional to) squared Euclidean distance.

The initial cluster distances in Ward's minimum variance method are therefore defined to be the squared Euclidean distance between points: Note: In software that implements Ward's method, it is important to check whether the function arguments should specify Euclidean distances or squared Euclidean distances.

Ward's minimum variance method can be defined and implemented recursively by a Lance–Williams algorithm.

The recursive formula simplifies finding the optimal pair.

Let An algorithm belongs to the Lance-Williams family if the updated cluster distance

Several standard clustering algorithms such as single linkage, complete linkage, and group average method have a recursive formula of the above type.

A table of parameters for standard methods is given by several authors.

[2][3][4] Ward's minimum variance method can be implemented by the Lance–Williams formula.

For instance, Wardp introduces the use of cluster specific feature weights, following the intuitive idea that features could have different degrees of relevance at different clusters.