OPTICS algorithm

It was presented by Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel and Jörg Sander.

[2] Its basic idea is similar to DBSCAN,[3] but it addresses one of DBSCAN's major weaknesses: the problem of detecting meaningful clusters in data of varying density.

Like DBSCAN, OPTICS requires two parameters: ε, which describes the maximum distance (radius) to consider, and MinPts, describing the number of points required to form a cluster.

Given a sufficiently large ε, this never happens, but then every ε-neighborhood query returns the entire database, resulting in

Hence, the ε parameter is required to cut off the density of clusters that are no longer interesting, and to speed up the algorithm.

OPTICS abstracts from DBSCAN by removing this parameter, at least to the extent of only having to give the maximum value.

The basic approach of OPTICS is similar to DBSCAN, but instead of maintaining known, but so far unprocessed cluster members in a set, they are maintained in a priority queue (e.g. using an indexed heap).

Using a reachability-plot (a special kind of dendrogram), the hierarchical structure of the clusters can be obtained easily.

It is a 2D plot, with the ordering of the points as processed by OPTICS on the x-axis and the reachability distance on the y-axis.

In its upper left area, a synthetic example data set is shown.

The yellow points in this image are considered noise, and no valley is found in their reachability plot.

Extracting clusters from this plot can be done manually by selecting ranges on the x-axis after visual inspection, by selecting a threshold on the y-axis (the result is then similar to a DBSCAN clustering result with the same

Additional care must be taken to the last points in a valley to assign them to the inner or outer cluster, this can be achieved by considering the predecessor.

[4] Clusterings obtained this way usually are hierarchical, and cannot be achieved by a single DBSCAN run.

The authors of the original OPTICS paper report an actual constant slowdown factor of 1.6 compared to DBSCAN.

DeLi-Clu,[6] Density-Link-Clustering combines ideas from single-linkage clustering and OPTICS, eliminating the

HiSC[7] is a hierarchical subspace clustering (axis-parallel) method based on OPTICS.

HiCO[8] is a hierarchical correlation clustering algorithm based on OPTICS.

HDBSCAN*[11] is based on a refinement of DBSCAN, excluding border-points from the clusters and thus following more strictly the basic definition of density-levels by Hartigan.

[12] Java implementations of OPTICS, OPTICS-OF, DeLi-Clu, HiSC, HiCO and DiSH are available in the ELKI data mining framework (with index acceleration for several distance functions, and with automatic cluster extraction using the ξ extraction method).

Other Java implementations include the Weka extension (no support for ξ cluster extraction).

The R package "dbscan" includes a C++ implementation of OPTICS (with both traditional dbscan-like and ξ cluster extraction) using a k-d tree for index acceleration for Euclidean distance only.