Random indexing

Random indexing is a dimensionality reduction method and computational framework for distributional semantics, based on the insight that very-high-dimensional vector space model implementations are impractical, that models need not grow in dimensionality when new items (e.g. new terminology) are encountered, and that a high-dimensional model can be projected into a space of lower dimensionality without compromising L2 distance metrics if the resulting dimensions are chosen appropriately.

[8] The TopSig technique[9] extends the random indexing model to produce bit vectors for comparison with the Hamming distance similarity function.

In a similar line of research, Random Manhattan Integer Indexing (RMII)[10] is proposed for improving the performance of the methods that employ the Manhattan distance between text units.

Many random indexing methods primarily generate similarity from co-occurrence of items in a corpus.

Reflexive Random Indexing (RRI)[11] generates similarity from co-occurrence and from shared occurrence with other items.