Kademlia

Kademlia is a distributed hash table for decentralized peer-to-peer computer networks designed by Petar Maymounkov and David Mazières in 2002.

[1][2] It specifies the structure of the network and the exchange of information through node lookups.

Further advantages are found particularly in the decentralized structure, which increases the resistance against a denial-of-service attack.

The protocols that these nodes use to communicate, and locate information, have become more efficient over time.

This distance is computed as the exclusive or (XOR) of the two node IDs, taking the result as an unsigned integer number.

Keys and node IDs have the same format and length, so distance can be calculated among them in exactly the same way.

Specifically: These three conditions are enough to ensure that XOR captures all of the essential, important features of a "real" distance function, while being cheap and simple to calculate.

A basic Kademlia search algorithm has complexity of O(log2 (n)), that means for network with

An actual Kademlia implementation does not have a fixed-size routing table, but a dynamically sized one.

This includes store and retrieval operations and even helping other nodes to find a key.

This keeps the network constantly updated and adds resilience to failures or attacks.

Since the quantity of possible IDs is much larger than any node population can ever be, some of the k-buckets corresponding to very short distances will remain empty.

[5][6] Due to this statistical distribution, Kademlia selects long connected nodes to remain stored in the k-buckets.

This increases the number of known valid nodes at some time in the future and provides for a more stable network.

This ensures that when the response is received it corresponds to the request previously sent (see magic cookie).

When the iterations stop, the best k nodes in the results list are the ones in the whole network that are the closest to the desired key.

Caching nodes will drop the value after a certain time depending on their distance from the key.

The node that is providing the file will periodically refresh the information onto the network (perform FIND_NODE and STORE messages).

When all of the nodes having the file go offline, nobody will be refreshing its values (sources and keywords) and the information will eventually disappear from the network.

The XOR metric allows Kademlia to extend routing tables beyond single bits.

An m-bit prefix reduces the maximum number of lookups from log2 n to log2m n. These are maximum values and the average value will be far less, increasing the chance of finding a node in a k-bucket that shares more bits than just the prefix with the target key.

Nodes can use mixtures of prefixes in their routing table, such as the Kad Network used by eMule.

[citation needed] The Kademlia network could even be heterogeneous in routing table implementations, at the expense of complicating the analysis of lookups.

While the XOR metric is not needed to understand Kademlia, it is critical in the analysis of the protocol.

The XOR arithmetic forms an abelian group allowing closed analysis.

Other DHT protocols and algorithms require simulation or complicated formal analysis in order to predict network behavior and correctness.

Thus routing can be seen as jumping among the leaves along these pointers such that each step goes towards the target ID as much as possible, i.e., in a greedy way.

By making Kademlia keyword searches, one can find information in the file-sharing network so it can be downloaded.

Since a key can correspond to many values, e.g. many sources of the same file, every storing node may have different information.

Since every filename in the list has its hash attached, the chosen file can then be obtained in the normal way.