Hash table

In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps keys to values.

Many hash table designs also allow arbitrary insertions and deletions of key–value pairs, at amortized constant average cost per operation.

For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets.

In January 1953, Hans Peter Luhn wrote an internal IBM memorandum that used hashing with chaining.

[8]: 124 Open addressing with linear probing is credited to Amdahl, although Andrey Ershov independently had the same idea.

[8]: 124–125 The term "open addressing" was coined by W. Wesley Peterson in his article which discusses the problem of search in large files.

[11] Separate chaining hash tables suffer gradually declining performance as the load factor grows, and no fixed point beyond which resizing is absolutely needed.

[19] For open addressing schemes, the hash function should also avoid clustering, the mapping of two or more keys to consecutive slots.

Such clustering may cause the lookup cost to skyrocket, even if the load factor is low and collisions are infrequent.

A number of K-independence results are known for collision resolution schemes such as linear probing and cuckoo hashing.

The first part is computing a hash function which transforms the search key into an array index.

[7]: 458 In separate chaining, the process involves building a linked list with key–value pair for each search array index.

[7]: 464 Collision resolution through chaining with linked list is a common method of implementation of hash tables.

be the hash table and the node respectively, the operation involves as follows:[16]: 258 If the element is comparable either numerically or lexically, and inserted into the list by maintaining the total order, it results in faster termination of the unsuccessful searches.

[21]: 520–521 If the keys are ordered, it could be efficient to use "self-organizing" concepts such as using a self-balancing binary search tree, through which the theoretical worst case could be brought down to

[22] A study shows array-based separate chaining to be 97% more performant when compared to the standard linked list method under heavy load.

[23]: 99 Techniques such as using fusion tree for each buckets also result in constant time for all operations with high probability.

[23]: 91 In cache-conscious variants of collision resolution through separate chaining, a dynamic array found to be more cache-friendly is used in the place where a linked list or self-balancing binary search trees is usually deployed, since the contiguous allocation pattern of the array could be exploited by hardware-cache prefetchers—such as translation lookaside buffer—resulting in reduced access time and memory consumption.

[10][23]: 93 The probing results in an infinite loop if the load factor reaches 1, in the case of a completely filled table.

[29] Coalesced hashing is a hybrid of both separate chaining and open addressing in which the buckets or nodes link within the table.

[31]: 8 Cuckoo hashing is a form of open addressing collision resolution technique which guarantees

The process continues until every key has its own spot in the empty buckets of the tables; if the procedure enters into infinite loop—which is identified through maintaining a threshold loop counter—both hash tables get rehashed with newer hash functions and the procedure continues.

[33]: 353 Robin Hood hashing is an open addressing based collision resolution algorithm; the collisions are resolved through favouring the displacement of the element that is farthest—or longest probe sequence length (PSL)—from its "home location" i.e. the bucket to which the item was hashed into.

[39] If a hash table becomes "too empty" after deleting some elements, resizing may be performed to avoid excessive memory usage.

The process of rehashing a bucket's items in accordance with the new hash function is termed as cleaning, which is implemented through command pattern by encapsulating the operations such as

[44]: 1 However, construction of such a hash function is practically infeasible, that being so, implementations depend on case-specific collision resolution techniques in achieving higher performance.

[30] Hash tables may also be used as disk-based data structures and database indices (such as in dbm) although B-trees are more popular in these applications.

[49] Many programming languages provide hash table functionality, either as built-in associative arrays or as standard library modules.

[50] ECMAScript 2015 also added the Map data structure, which accepts arbitrary values as keys.

[53] Java programming language includes the HashSet, HashMap, LinkedHashSet, and LinkedHashMap generic collections.