It is similar to the first of three seeding methods proposed, in independent work, in 2006[3] by Rafail Ostrovsky, Yuval Rabani, Leonard Schulman and Chaitanya Swamy.
To illustrate the potential of the k-means algorithm to perform arbitrarily poorly with respect to the objective function of minimizing the sum of squared distances of cluster points to the centroid of their assigned clusters, consider the example of four points in
and the two initial cluster centers lie at the midpoints of the top and bottom line segments of the rectangle formed by the four data points, the k-means algorithm converges immediately, without moving these cluster centers.
The standard k-means algorithm will continue to cluster the points suboptimally, and by increasing the horizontal distance between the two data points in each cluster, we can make the algorithm perform arbitrarily poorly with respect to the k-means objective function.
The exact algorithm is as follows: This seeding method yields considerable improvement in the final error of k-means.
This is in contrast to vanilla k-means, which can generate clusterings arbitrarily worse than the optimum.
Lee et al.[9] report an application of k-means++ to create geographical cluster of photographs based on the latitude and longitude information attached to the photos.