SUBCLU

SUBCLU is an algorithm for clustering high-dimensional data by Karin Kailing, Hans-Peter Kriegel and Peer Kröger.

[1] It is a subspace clustering algorithm that builds on the density-based clustering algorithm DBSCAN.

SUBCLU can find clusters in axis-parallel subspaces, and uses a bottom-up, greedy strategy to remain efficient.

SUBCLU uses a monotonicity criteria: if a cluster is found in a subspace

This downward-closure property is utilized by SUBCLU in a way similar to the Apriori algorithm: first, all 1-dimensional subspaces are clustered.

SUBCLU hence recursively produces

After pruning irrelevant candidates, DBSCAN is applied to the candidate subspace to find out if it still contains clusters.

In order to improve the runtime of DBSCAN, only the points known to belong to clusters in one

-dimensional subspace (which is chosen to contain as little clusters as possible) are considered.

Due to the downward-closure property, other point cannot be part of a

, which serve the same role as in DBSCAN.

In a first step, DBSCAN is used to find 1D-clusters in each subspace spanned by a single attribute:

{\displaystyle {\mathtt {SUBCLU}}(DB,eps,MinPts)}

contains the sets of clusters found in the subspaces.

is chosen to minimize the runs of DBSCAN (and the number of points that need to be considered in each run) for finding the clusters in the candidate subspaces.

Candidate subspaces are generated much alike the Apriori algorithm generates the frequent itemset candidates: Pairs of the

-dimensional subspaces are compared, and if they differ in one attribute only, they form a

However, a number of irrelevant candidates are found as well; they contain a

Hence, these candidates are removed in a second step:

An example implementation of SUBCLU is available in the ELKI framework.