Trie

While basic trie implementations can be memory-intensive, various optimization techniques such as compression and bitwise representations have been developed to improve their efficiency.

While tries commonly store character strings, they can be adapted to work with any ordered sequence of elements, such as permutations of digits or shapes.

A notable variant is the bitwise trie, which uses individual bits from fixed-length binary data (such as integers or memory addresses) as keys.

[4][3][5]: 336 The idea was independently described in 1960 by Edward Fredkin,[6] who coined the term trie, pronouncing it /ˈtriː/ (as "tree"), after the middle syllable of retrieval.

[1] Tries can be efficacious on string-searching algorithms such as predictive text, approximate string matching, and spell checking in comparison to binary search trees.

[15]: 135 If a null link is encountered prior to reaching the last character of the string key, a new node is created (line 3).

A trie can be used to replace a hash table, over which it has the following advantages:[12]: 358 However, tries are less efficient than a hash table when the data is directly accessed on a secondary storage device such as a hard disk drive that has higher random access time than the main memory.

[6] Tries are also disadvantageous when the key value cannot be easily represented as string, such as floating point numbers where multiple representations are possible (e.g. 1 is equivalent to 1.0, +1.0, 1.00, etc.

[18] Bitwise tries are used to address the enormous space requirement for the trie nodes in a naive simple pointer vector implementations.

The implementations for these types of trie use vectorized CPU instructions to find the first set bit in a fixed-length key input (e.g. GCC's __builtin_clz() intrinsic function).

Accordingly, the set bit is used to index the first item, or child node, in the 32- or 64-entry based bitwise tree.

[19] This procedure is also cache-local and highly parallelizable due to register independency, and thus performant on out-of-order execution CPUs.

[19] Radix tree, also known as a compressed trie, is a space-optimized variant of a trie in which any node with only one child gets merged with its parent; elimination of branches of the nodes with a single child results in better metrics in both space and time.

[20][21]: 452 This works best when the trie remains static and set of keys stored are very sparse within their representation space.

[15]: 140-141 A naive implementation of a trie consumes immense storage due to larger number of leaf-nodes caused by sparse distribution of keys; Patricia trees can be efficient for such cases.

[24]: 3 The skip number 1 at node 0 corresponds to the position 1 in the binary encoded ASCII where the leftmost bit differed in the key set

[24]: 3-4 The skip number is crucial for search, insertion, and deletion of nodes in the Patricia tree, and a bit masking operation is performed during every iteration.

[15]: 143 Trie data structures are commonly used in predictive text or autocomplete dictionaries, and approximate matching algorithms.

[11] Tries enable faster searches, occupy less space, especially when the set contains large number of short strings, thus used in spell checking, hyphenation applications and longest prefix match algorithms.

This is because DAFSAs and radix trees can compress identical branches from the trie which correspond to the same suffixes (or parts) of different words being stored.

String dictionaries are also utilized in natural language processing, such as finding lexicon of a text corpus.

[27] Tries are also fundamental data structures for burstsort, which is notable for being the fastest string sorting algorithm as of 2007,[28] accomplished by its efficient use of CPU cache.

[25]: 75 Compressed variants of tries, such as databases for managing Forwarding Information Base (FIB), are used in storing IP address prefixes within routers and bridges for prefix-based lookup to resolve mask-based operations in IP routing.