How Not to Implement a Dht

March 7, 2024

What is a DHT?

A DHT is a type of database that scales to 10s of millions of nodes. In a DHT the users also contribute to the storage of the data.

DHT is an acronym for Distributed Hash Table.

Distributed refers to the fact that DHTs involve many independent nodes spread across a network.

A hash is the value returned by a hash function, which is any function that maps arbitrary data to a fixed length value.

A hash table is a common data structure that uses hashes as address for values. Hash tables are able locate a value without having to search for it. For a hash table distributed over many nodes, this means nodes can store values without regard for the values stored by other nodes, and without storing the key.

Commonly used data structures that use in-memory hash tables include Go’s map, Python’s dict, C++’s unordered_map, and Rust’s HashMap.

Compared to a conventional in-memory hash table, a DHT:

Scales to many machines.
Doesn’t guarantee consistency.
Does not provide an operation allowing traversal of all values.

With a hash function that distributes keys uniformly over the address space, a DHT can allow for the load of hosting and serving data to be spread evenly across the entire network of participants.

Which DHT?

The primary DHT used in BitTorrent is known as the mainline DHT, first released in 2005.

The following describes DHTs from the perspective of the BitTorrent mainline DHT.

Nodes have a fixed length byte array used as the local ID. Keys are mapped to a hash of the same length with a hash function.

A query is a single message sent to another node. It includes:

A target hash.
An operation.
Some arguments.
The sender’s node ID.

A reply includes:

The queried node’s node ID.
An error or a response to the query.
A list of nodes closer to the target.

Nodes store the node ID and network location of other nodes in the network in a routing table. Each node’s routing table reserves exponentially more space for nodes with IDs closer to the node’s own ID. This means that when doing a lookup on the DHT, nodes with IDs closer to the target hash are likely to know significantly more nodes closer to their own ID, and closer to the query’s target hash than those that are further away.

Most operations benefit from traversing the DHT network toward nodes closer to a target hash. To do this, queries are sent to known nodes with the closest IDs to the target hash, and then sending subsequent queries if they respond with the addresses of closer nodes. Typically some level of concurrency is employed, a number of outstanding queries are sent, and further queries are sent after a response is received or there’s a timeout.

The node IDs returned will converge to the closest node IDs to the target hash. As this occurs, we will start to see responses with overlapping values. We do not query a given node more than once. Eventually there will be no closer node IDs. For write operations, the operation may be performed on multiple peers as a form of redundancy. Similarly on read operations, extra reads may be performed to ensure a satisfactory level of consistency in the network.

Some general purpose operations in the DHT are ping, find_node, put and get.

A ping is a single query sent to another node. That node simply replies. This operation is commonly used to check the health of nodes in the routing table.

find_node is an operation used for bootstrapping. Bootstrapping is the process of populating the routing table. Since a routing table stores more nodes closer to the local node ID, the target hash for a find_node operation is the local node ID, or a part of the routing table that is sparse. This way we can traverse the DHT network toward nodes with IDs for which there is more space available in the routing table.

Puts and gets are complimentary operations that comes in two forms.

Immutable puts target the hash of the value. Since an immutable value can be verified and can’t change, the get for an immutable value can stop traversing as soon as node responds with a value that hashes to the target.

In a mutable put, the target is a hash of a publisher’s public key and a salt allowing for them to publish multiple values with a single key. The value contains a sequence number to allow updating the value, and a signature to prove the publisher authorized the value. A mutable get requires traversing the DHT and returning the value with the highest sequence number and a valid signature.

There is a dilemma when the routing table is empty. In this case there are no nodes to query, including for bootstrapping. Normally, the routing table is stored between sessions, but when there is no known nodes, implementations typically query nodes with static address known as bootstrap or router nodes. Storing these nodes in your routing table is discouraged, and they usually only respond to certain kinds of queries and rate limit responses.

BitTorrent’s use of the DHT

DHT nodes do not discriminate BitTorrent traffic. The most common operations are related to BitTorrent use, but do not have to be used for that purpose. Any user of the DHT must only conform to using target addresses and node IDs of the same length, and using the same message encoding.

BitTorrent v2 uses the same DHT network, by truncating v2 infohashes to fit.

A swarm is set of peers participating in BitTorrent around a particular torrent. A torrent is identified by an infohash, a cryptographic hash of the info bytes that describe a torrent. The cryptographic hash has the property that infohashes are uniformly distributed across all the possible keys in the DHT.

Common BitTorrent specific operations are get_peers, and announce_peer.

get_peers is an operation that returns a set of BitTorrent client peer addresses for a swarm. The target for the query is the infohash. Nodes prefer to store peer addresses for infohashes that are closest to their own node ID, so get_peers responses also include a list of closer nodes for use in a traversal. get_peers responses also include a token so the querier can add themselves to the list of peers.

announce_peer is an operation that requests a node add the querier to the set of peers for a swarm. A token is included that was observed in a previous get_peers response to prevent spoofing.

Some less common additions are:

Scraping (counting the active members of) a swarm, by including a bloom filter in the get_peers response. The bloom filter design allows combining responses from multiple nodes.

Sampling (sample_infohashes), a query that asks a node what infohashes it’s holding peers for. This can be used to build an index of what’s on the network, working around (at some expense) the aforementioned property that DHT nodes do not need to know about nearby addresses of values.

How not to implement a DHT

Implementing a DHT that scales requires strict adherence to some principles that do not seem to be intuitive.

All nodes should be considered equal, and operating a node should require minimal resources. If some nodes are special, those nodes are obvious targets for attack. Similarly if special nodes go down, the network will suffer as the load will fall to less able nodes that can’t meet the special requirements. The default choice should always be to actively participate in the network as it comes with neglible cost. There should not be booster nodes, or super nodes of any kind.

If part of the address space receives a lot of traffic (due to perhaps there being some popular content at a nearby address), and nodes begin failing or refusing to respond to queries, reads and writes will migrate to nodes farther from the address. This occurs immediately: Nodes that do not respond to queries are removed from routing tables, and do not count toward write replication. If a node has only partial availability or capacity, this will be reflected by that node appearing in fewer routing tables and receiving fewer writes as a result.

Queries should not be retried. Failing to respond to a query is a vital signal to the network that a node is under stress and traffic should be reduced. When a node is stressed, retries will exacerbate the situation by increasing traffic and ignoring the signal. Retries require more state on the part of the initiater.

Stream-oriented and stateful protocols like TCP should not be used. Maintaining these stateful connections requires considerably more resources and raises the requirements for a node to participate in the network. This compounds the resource requirements as there are less nodes with sufficient resources to participate as full members to spread the load. Stream-oriented transports also use retries, which violates the requirement that retries not be used above.