Distributed hash table

Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key.

The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys.

[1] Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data.

[2] Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption.

DHTs form an infrastructure that can be used to build more complex services, such as anycast, cooperative web caching, distributed file systems, domain name services, instant messaging, multicast, and also peer-to-peer file sharing and content distribution systems.

Notable distributed networks that use DHTs include BitTorrent's distributed tracker, the Kad network, the Storm botnet, the Tox instant messenger, Freenet, the YaCy search engine, and the InterPlanetary File System.

DHT research was originally motivated, in part, by peer-to-peer (P2P) systems such as Freenet, Gnutella, BitTorrent and Napster, which took advantage of resources distributed across the Internet to provide a single useful application.

In particular, they took advantage of increased bandwidth and hard disk capacity to provide a file-sharing service.

Napster, the first large-scale P2P content delivery system, required a central index server: each node, upon joining, would send a list of locally held files to the server, which would perform searches and refer the queries to the nodes that held the results.

Later versions of Gnutella clients moved to a dynamic querying model which vastly improved efficiency.

Distributed hash tables use a more structured key-based routing in order to attain both the decentralization of Freenet and Gnutella, and the efficiency and guaranteed results of Napster.

A project called the Infrastructure for Resilient Internet Systems (Iris) was funded by a $12 million grant from the United States National Science Foundation in 2002.

[9] Researchers included Sylvia Ratnasamy, Ion Stoica, Hari Balakrishnan and Scott Shenker.

[10] Outside academia, DHT technology has been adopted as a component of BitTorrent and in PlanetLab projects such as the Coral Content Distribution Network.

[11] DHTs characteristically emphasize the following properties: A key technique used to achieve these goals is that any one node needs to coordinate with only a few other nodes in the system‍— most commonly, O(log n) of the n participants (see below)‍— so that only a limited amount of work needs to be done for each change in membership.

An overlay network then connects the nodes, allowing them to find the owner of any given key in the keyspace.

Any other client can then retrieve the contents of the file by again hashing filename to produce k and asking any DHT node to find the data associated with k with a message get(k).

The keyspace partitioning and overlay network components are described below with the goal of capturing the principal ideas common to most DHTs; many designs differ in the details.

The two algorithms appear to have been devised independently and simultaneously to solve the distributed hash table problem.

Contrast this with a traditional hash table in which addition or removal of one bucket causes nearly the entire keyspace to be remapped.

[17] Sorting ensures that similar keys are stored by neighbour nodes and that discovery procedures, including range queries, can be performed in logarithmic time.

Oscar constructs a navigable small-world network based on random walk sampling also assuring logarithmic search time.

It is then easy to route a message to the owner of any key k using the following greedy algorithm (that is not necessarily globally optimal): at each step, forward the message to the neighbor whose ID is closest to k. When there is no such neighbor, then we must have arrived at the closest node, which is the owner of k as defined above.

Many DHTs use that flexibility to pick neighbors that are close in terms of latency in the physical underlying network.

Clearly, the network's worst case route length is at least as large as its diameter, so DHTs are limited by the degree/diameter tradeoff[20] that is fundamental in graph theory.

[24] Because of the decentralization, fault tolerance, and scalability of DHTs, they are inherently more resilient against a hostile attacker than a centralized system.

[vague] Open systems for distributed data storage that are robust against massive hostile attackers are feasible.

[28] Petar Maymounkov, one of the original authors of Kademlia, has proposed a way to circumvent the weakness to the Sybil attack by incorporating social trust relationships into the system design.

[29] The new system, codenamed Tonika or also known by its domain name as 5ttt, is based on an algorithm design known as "electric routing" and co-authored with the mathematician Jonathan Kelner.

[citation needed] Most notable differences encountered in practical instances of DHT implementations include at least the following: