user@techdebt:~/blog$_
$ cd ..

The Hash Ring

Hash Ring (0 → 2³²) hash(key) % N Virtual Nodes
mode:
$ hashring inspector
click Step or Play to inspect hash distribution
$ simulation.log

the problem

You have 5 cache servers. You decide where each key lives with:

server = hash(key) % 5

Simple. Fast. Works great, until a server dies. Now it’s hash(key) % 4, and roughly 80% of your keys map to a different server. Every one of those is a cache miss. Your database gets slammed with traffic that should have been cached.

It gets worse: when the dead server comes back, you switch to % 5 again, and another 80% of keys shuffle. Every topology change triggers a near-total redistribution.

# before: 5 servers
hash("user:42") % 5  # → server 2

# server 3 dies, now 4 servers
hash("user:42") % 4  # → server 0  (different!)

The modulus operator distributes keys evenly, but it distributes them differently for every value of N. That’s the fundamental issue.

the ring

Instead of a modulus, imagine the entire hash output space (0 to 2³²) bent into a circle. Each server gets hashed to a position on this ring. Each key also gets hashed to a position. To find which server owns a key, walk clockwise from the key’s position until you hit a server.

def find_server(key_hash, server_hashes):
    """Walk clockwise to the first server."""
    for server_hash in sorted(server_hashes):
        if server_hash >= key_hash:
            return server_hash
    return sorted(server_hashes)[0]  # wrap around

The critical insight: each server is responsible for the arc between itself and the previous server. A key’s assignment depends only on its two nearest neighbors, not on the total count of servers.

adding and removing

When server C dies, only the keys in C’s arc need to move: they walk clockwise to the next server. Every other key stays put. Instead of ~80% redistribution, you get ~1/N.

When a new server F joins, it claims part of the arc from its clockwise neighbor. Only those keys move. The rest of the ring is untouched.

# Ring: A, B, C, D, E at their hash positions
# C dies → only keys between B and C move to D
# F joins between A and B → only some of B's keys move to F

This is the core guarantee: adding or removing a server moves O(K/N) keys, the minimum possible.

virtual nodes

There’s a catch. With only 5 points on a 2³² ring, the arcs are uneven. One server might own 40% of the space while another owns 5%. The fix is virtual nodes: each physical server places K positions on the ring.

# Server A gets positions:
# hash("A-vn0"), hash("A-vn1"), ..., hash("A-vn149")

With K=150, each server’s total arc coverage converges to 1/N. The standard deviation of load drops dramatically as K increases. In practice, 100-200 virtual nodes per server gives near-perfect distribution.

code

A minimal consistent hash ring in Python:

import hashlib
from bisect import bisect_right

class HashRing:
    def __init__(self, nodes=None, vnodes=150):
        self.vnodes = vnodes
        self.ring = {}        # hash → node
        self.sorted_keys = [] # sorted hash positions
        for node in (nodes or []):
            self.add(node)

    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def add(self, node: str):
        for i in range(self.vnodes):
            h = self._hash(f"{node}-vn{i}")
            self.ring[h] = node
        self.sorted_keys = sorted(self.ring.keys())

    def remove(self, node: str):
        for i in range(self.vnodes):
            h = self._hash(f"{node}-vn{i}")
            del self.ring[h]
        self.sorted_keys = sorted(self.ring.keys())

    def get(self, key: str) -> str:
        h = self._hash(key)
        idx = bisect_right(self.sorted_keys, h)
        if idx == len(self.sorted_keys):
            idx = 0
        return self.ring[self.sorted_keys[idx]]

where it shows up

  • Amazon DynamoDB: partitions data across nodes using consistent hashing. Their 2007 Dynamo paper popularized the technique.
  • Apache Cassandra: uses a token ring where each node owns a range of the hash space.
  • CDNs (Akamai, Cloudflare): route requests to edge servers without reshuffling everything when a node goes down.
  • Discord: routes guild data across database shards using a hash ring.
  • Memcached clients: libketama implements consistent hashing for cache key distribution.