Double hashing

Double hashing is a computer programming technique used in conjunction with open addressing in hash tables to resolve hash collisions, by using a secondary hash of the key as an offset when a collision occurs.

Double hashing with open addressing is a classical data structure on a table

The double hashing technique uses one hash value as an index into the table and then repeatedly steps forward an interval until the desired value is located, an empty location is reached, or the entire table has been searched; but this interval is set by a second, independent hash function.

Unlike the alternative collision-resolution methods of linear probing and quadratic probing, the interval depends on the data, so that values mapping to the same location have different bucket sequences; this minimizes repeated collisions and the effects of clustering.

Given two random, uniform, and independent hash functions

th location in the bucket sequence for value

are selected from a set of universal hash functions;

Double hashing approximates a random distribution; more precisely, pair-wise independent hash functions yield a probability of

that any pair of keys will follow the same bucket sequence.

The secondary hash function

be the number of elements stored in

That is, start by randomly, uniformly and independently selecting two universal hash functions

to build a double hashing table

-st hash location is computed by:

have fixed load factor

Bradford and Katehakis[2] showed the expected number of probes for an unsuccessful search in

, still using these initially chosen hash functions, is

Pair-wise independence of the hash functions suffices.

Like all other forms of open addressing, double hashing becomes linear as the hash table approaches maximum capacity.

The usual heuristic is to limit the table loading to 75% of capacity.

Eventually, rehashing to a larger size will be necessary, as with all other open addressing schemes.

Peter Dillinger's PhD thesis[3] points out that double hashing produces unwanted equivalent hash functions when the hash functions are treated as a set, as in Bloom filters: If

This makes a collision twice as likely as the hoped-for

There are additionally a significant number of mostly-overlapping hash sets; if

, and comparing additional hash values (expanding the range of

Adding a quadratic term

(triple hashing)[5] to the hash function improves the hash function somewhat[4] but does not fix this problem; if: then Adding a cubic term

(a tetrahedral number),[1] does solve the problem, a technique known as enhanced double hashing.

This can be computed efficiently by forward differencing: In addition to rectifying the collision problem, enhanced double hashing also removes double-hashing's numerical restrictions on

's properties, allowing a hash function similar in property to (but still independent of)