Compressed suffix array

[1][2] These data structures enable quick search for an arbitrary string with a comparatively small index.

is the k-th order empirical entropy of the text T. The time and space to construct a compressed suffix array are normally ⁠

The original presentation of a compressed suffix array[1] solved a long-standing open problem by showing that fast pattern matching was possible using only a linear-space data structure, namely, one proportional to the size of the text T, which takes

The basis for the data structure is a recursive decomposition using the "neighbor function," which allows a suffix array to be represented by one of half its length.

The construction is repeated multiple times until the resulting suffix array uses a linear number of bits.

[3] The space usage is extremely competitive in practice with other state-of-the-art compressors,[5] and it also supports fast in-situ pattern matching.

The memory accesses made by compressed suffix arrays and other compressed data structures for pattern matching are typically not localized, and thus these data structures have been notoriously hard to design efficiently for use in external memory.

Recent progress using geometric duality takes advantage of the block access provided by disks to speed up the I/O time significantly[6] In addition, potentially practical search performance for a compressed suffix array in external-memory has been demonstrated.

[7] There are several open source implementations of compressed suffix arrays available (see External Links below).

Bowtie and Bowtie2 are open-source compressed suffix array implementations of read alignment for use in bioinformatics.