iDistance

The iDistance is designed to process kNN queries in high-dimensional spaces efficiently and it is especially good for skewed data distributions, which usually occur in real-life data sets.

[1] Building the iDistance index has two steps: The figure on the right shows an example where three reference points (O1, O2, O3) are chosen.

Various extensions have been proposed to make the selection of reference points for effective query performance, including employing machine learning to learn the identification of reference points.

Instead of scanning records from the beginning to the end of the data file, the iDistance starts the scan from spots where the nearest neighbors can be obtained early with a very high probability.

[7] Later, together with Rui Zhang, they improved the technique and performed a more comprehensive study on it in 2005.