Succinct data structure

In computer science, a succinct data structure is a data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but (unlike other compressed representations) still allows for efficient query operations.

The concept was originally introduced by Jacobson[1] to encode bit vectors, (unlabeled) trees, and planar graphs.

is the information-theoretical optimal number of bits needed to store some data.

Implicit structures are thus usually reduced to storing information using some permutation of the input data; the most well-known example of this is the heap.

auxiliary structure) and supports rank and select in constant time.

It uses an idea similar to that for range-minimum queries; there are a constant number of recursions before stopping at a subproblem of a limited size.

For each large block, the rank of its first bit is stored in a separate table

can then be used that stores the answer to every possible rank query on a bit string of length

can be replaced by bitwise operations and smaller tables that can be used to find the number of bits set in the small blocks.

This is often beneficial, since succinct data structures find their uses in large data sets, in which case cache misses become much more frequent and the chances of the lookup table being evicted from closer CPU caches becomes higher.

[5] Select queries can be easily supported by doing a binary search on the same auxiliary structure used for rank; however, this takes

bits of additional storage can be used to support select in constant time.

notation which dominate before any asymptotic advantage becomes apparent; implementations using broadword operations and word-aligned blocks often perform better in practice.

is an information theoretic lower bound on the number of bits needed to store

[8] This structure can be extended to support rank and select queries and takes

[2] Correct rank queries in this structure are however limited to elements contained in the set, analogous to how minimal perfect hashing functions work.

[9] It is also possible to construct a indexible dictionary supporting rank (but not select) that uses fewer than

Such a dictionary is called a monotone minimal perfect hash function, and can be implemented using as few as

bits, and while supporting membership queries in constant expected time.

The first dynamic succinct hash table was due to Raman and Rao in 2003.

[14][15] The latter solution supports all operations in worst-case constant time with high probability.

[19] The final solution also requires access to a lookup table of size

, but this lookup table is independent of the set of elements being stored.

If there is a maximum length – which is the case in practice, since 232 = 4 GiB of data is a very long string, and 264 = 16 EiB of data is larger than any string in practice – then a string with a length is also implicit, taking Z + k space, where k is the number of data to represent the maximum length (e.g., 64 bits).

When a sequence of variable-length items (such as strings) needs to be encoded, there are various possibilities.

A direct approach is to store a length and an item in each record – these can then be placed one after another.

An alternative is to place the items in order with a delimiter (e.g., null-terminated string).

An alternative approach is out-of-band separation: the items can simply be placed one after another, with no delimiters.

Alternatively, a separate binary string consisting of 1s in the positions where an item begins, and 0s everywhere else is encoded along with it.

bits while supporting a variety of operations on any node, which includes finding its parent, its left and right child, and returning the size of its subtree, each in constant time.