CURE algorithm

The popular K-means clustering algorithm minimizes the sum of squared errors criterion: Given large differences in sizes or geometries of different clusters, the square error method could split the large clusters to minimize the square error, which is not always correct.

[citation needed] Using only the centroid to redistribute the data has problems when clusters lack uniform sizes and shapes.

To avoid the problems with non-uniform sized or shaped clusters, CURE employs a hierarchical clustering algorithm that adopts a middle ground between the centroid based and all point extremes.

This enables CURE to correctly identify the clusters and makes it less sensitive to outliers.

The algorithm cannot be directly applied to large databases because of the high runtime complexity.