Dunn index

[1][2] This is part of a group of validity indices including the Davies–Bouldin index or Silhouette index, in that it is an internal evaluation scheme, where the result is based on the clustered data itself.

One of the drawbacks of using this is the computational cost as the number of clusters and dimensionality of the data increase.

Each of these formulations are mathematically shown below: Let Ci be a cluster of vectors.

Let x and y be any two n dimensional feature vectors assigned to the same cluster Ci.

With the above notation, if there are m clusters, then the Dunn Index for the set is defined as: where

Being defined in this way, the DI depends on m, the number of clusters in the set.

This formulation has a peculiar problem, in that if one of the clusters is badly behaved, where the others are tightly packed, since the denominator contains a 'max' term instead of an average term, the Dunn Index for that set of clusters will be uncharacteristically low.

There are ready implementations of the Dunn index in some vector based programming languages like MATLAB, R and Apache Mahout.