The Hash Ring | techdebt.blog

hash(key) % N reshuffles almost every key when a server joins or leaves. A hash ring maps servers and keys onto a circle: only neighbors are affected. Virtual nodes smooth the load.

mode:

click Step or Play to inspect hash distribution

mode:

click Step or Play to inspect hash distribution

the problem

You have 5 cache servers. You decide where each key lives with:

server = hash(key) % 5

Simple. Fast. Works great, until a server dies. Now it’s hash(key) % 4, and roughly 80% of your keys map to a different server. Every one of those is a cache miss. Your database gets slammed with traffic that should have been cached.

It gets worse: when the dead server comes back, you switch to % 5 again, and another 80% of keys shuffle. Every topology change triggers a near-total redistribution.

# before: 5 servers
hash("user:42") % 5  # → server 2

# server 3 dies, now 4 servers
hash("user:42") % 4  # → server 0  (different!)

The modulus operator distributes keys evenly, but it distributes them differently for every value of N. That’s the fundamental issue.

the ring

Instead of a modulus, imagine the entire hash output space (0 to 2³²) bent into a circle. Each server gets hashed to a position on this ring. Each key also gets hashed to a position. To find which server owns a key, walk clockwise from the key’s position until you hit a server.

def find_server(key_hash, server_hashes):
    """Walk clockwise to the first server."""
    for server_hash in sorted(server_hashes):
        if server_hash >= key_hash:
            return server_hash
    return sorted(server_hashes)[0]  # wrap around

The critical insight: each server is responsible for the arc between itself and the previous server. A key’s assignment depends only on its two nearest neighbors, not on the total count of servers.

adding and removing

When server C dies, only the keys in C’s arc need to move: they walk clockwise to the next server. Every other key stays put. Instead of ~80% redistribution, you get ~1/N.

When a new server F joins, it claims part of the arc from its clockwise neighbor. Only those keys move. The rest of the ring is untouched.

# Ring: A, B, C, D, E at their hash positions
# C dies → only keys between B and C move to D
# F joins between A and B → only some of B's keys move to F

This is the core guarantee: adding or removing a server moves O(K/N) keys, the minimum possible.

virtual nodes

There’s a catch. With only 5 points on a 2³² ring, the arcs are uneven. One server might own 40% of the space while another owns 5%. The fix is virtual nodes: each physical server places K positions on the ring.

# Server A gets positions:
# hash("A-vn0"), hash("A-vn1"), ..., hash("A-vn149")

With K=150, each server’s total arc coverage converges to 1/N. The standard deviation of load drops dramatically as K increases. In practice, 100-200 virtual nodes per server gives near-perfect distribution.

code

A minimal consistent hash ring in Python:

import hashlib
from bisect import bisect_right

class HashRing:
    def __init__(self, nodes=None, vnodes=150):
        self.vnodes = vnodes
        self.ring = {}        # hash → node
        self.sorted_keys = [] # sorted hash positions
        for node in (nodes or []):
            self.add(node)

    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def add(self, node: str):
        for i in range(self.vnodes):
            h = self._hash(f"{node}-vn{i}")
            self.ring[h] = node
        self.sorted_keys = sorted(self.ring.keys())

    def remove(self, node: str):
        for i in range(self.vnodes):
            h = self._hash(f"{node}-vn{i}")
            del self.ring[h]
        self.sorted_keys = sorted(self.ring.keys())

    def get(self, key: str) -> str:
        h = self._hash(key)
        idx = bisect_right(self.sorted_keys, h)
        if idx == len(self.sorted_keys):
            idx = 0
        return self.ring[self.sorted_keys[idx]]

where it shows up

Amazon DynamoDB: partitions data across nodes using consistent hashing. Their 2007 Dynamo paper popularized the technique.
Apache Cassandra: uses a token ring where each node owns a range of the hash space.
CDNs (Akamai, Cloudflare): route requests to edge servers without reshuffling everything when a node goes down.
Discord: routes guild data across database shards using a hash ring.
Memcached clients: libketama implements consistent hashing for cache key distribution.

mode:

click Step or Play to inspect hash distribution

the problem in depth

The naive approach hash(key) % N is a special case of the pigeonhole principle: N pigeonholes, K keys, each key deterministically assigned. The issue isn’t the distribution (which is uniform) but the stability: changing N remaps almost everything.

Formally, when going from N to N-1 servers, the fraction of keys that keep their assignment is approximately 1/N. For 5 servers, only ~20% stay put. For 100 servers, 99% move. The larger your cluster, the worse the storm.

the ring in depth

David Karger et al. introduced consistent hashing in 1997 at MIT. The key properties:

Balance: keys are distributed roughly evenly across servers
Monotonicity: when a server is added, keys only move to it, never between existing servers
Spread: a key maps to a small number of servers across different client views
Load: no server is assigned disproportionately many keys

The ring achieves property 2 naturally: a new server only steals keys from its clockwise neighbor.

virtual nodes in depth

Without virtual nodes, the expected load standard deviation with N servers is O(1/√N). With K virtual nodes per server, it drops to O(1/√(NK)), a factor of √K improvement.

The tradeoff: more virtual nodes means a larger ring to search. With N servers and K vnodes each, lookup is O(log(NK)) using binary search. For N=100, K=150, that’s 15,000 ring positions, still fast with bisect.

+ hash function selection

The hash function needs to be fast and distribute uniformly. Cryptographic hashes (SHA-256, MD5) work but are overkill; you don’t need collision resistance for load balancing.

MurmurHash3 is the most common choice in production systems. It’s fast (3-5 GB/s), has excellent distribution, and produces 128-bit hashes. Cassandra uses it.

xxHash is even faster (~30 GB/s) with comparable distribution quality. Good for high-throughput systems.

FNV-1a is simple (~10 lines of code) and fast enough for most uses. The simulation above uses it.

The important thing: avoid hash functions with clustering (like naive string hashing with hash = hash * 31 + char). Clustering on the ring means uneven load.

+ bounded-load consistent hashing

Google’s 2017 paper introduced bounded-load consistent hashing: each server has a capacity cap of ⌈(1 + ε) * average_load⌉. When a server is full, keys spill clockwise to the next available server.

This gives a worst-case load bound of (1 + ε) times the average, at the cost of slightly more key movement during rebalancing. With ε = 0.25, no server ever has more than 25% above average load.

The algorithm is simple: during lookup, walk clockwise as usual, but skip servers that are at capacity. This is used in Google’s load balancer Maglev and Vimeo’s CDN.

+ rendezvous hashing (HRW)

Highest Random Weight hashing is an alternative to ring-based consistent hashing. For each key, compute hash(key, server) for every server. The server with the highest hash wins.

def hrw_lookup(key, servers):
    return max(servers, key=lambda s: hash(f"{key}-{s}"))

Pros: simpler than a ring, no virtual nodes needed, naturally balanced. Cons: O(N) lookup per key (must hash against every server). For small N (< 100), this is fine. For large N, the ring’s O(log N) lookup wins.

Rendezvous hashing is used in Microsoft’s Azure, Twitter’s caching layer, and Oracle’s NFS.

+ jump consistent hashing

Google’s jump consistent hash (2014) uses no ring at all. It maps a key to one of N buckets using a clever mathematical trick:

def jump_hash(key: int, num_buckets: int) -> int:
    b, j = -1, 0
    while j < num_buckets:
        b = j
        key = (key * 2862933555777941757 + 1) & 0xFFFFFFFFFFFFFFFF
        j = int((b + 1) * (1 << 31) / ((key >> 33) + 1))
    return b

O(ln N) time, zero memory, perfect balance. The catch is that it only works when servers are numbered 0..N-1 and you only add/remove from the end. You can’t remove server 3 out of 5. This makes it great for sharded databases but unsuitable for dynamic clusters where arbitrary nodes fail.

mode:

click Step or Play to inspect hash distribution

production stories

Amazon DynamoDB’s partitioning evolution

The 2007 Dynamo paper used consistent hashing with virtual nodes, but they ran into problems. With random token assignment, adding a node required copying data from many other nodes simultaneously: a “bootstrap storm.” New nodes were slow to become useful.

Their fix (documented in the 2012 DynamoDB paper): abandon random tokens. Instead, split the hash space into fixed partitions upfront and assign partitions to nodes. Adding a node just transfers whole partitions. This is technically a step away from classic consistent hashing toward pre-partitioned hash ranges, but it keeps the O(K/N) key movement guarantee.

Cassandra’s token ring in practice

Cassandra originally used random token assignment (like the 1997 paper). Operators had to manually calculate tokens for even distribution when adding nodes. The nodetool move command would rebalance by shifting tokens: a slow, error-prone process.

Cassandra 1.2 introduced virtual nodes (called “vnodes” in their docs) with a default of 256 per node. Adding a node automatically claims vnodes from all existing nodes, spreading the bootstrap load. But 256 vnodes meant each node stored data ranges from 256 different parts of the key space, making range queries slower (more disk seeks).

Cassandra 4.0 reduced the default to 16 vnodes with a new token allocation algorithm that explicitly optimizes for balance. The lesson: more virtual nodes isn’t always better. There’s a tradeoff between load balance and operational efficiency.

Maglev hashing

Google’s Maglev load balancer (2016) uses lookup-table-based consistent hashing. Instead of a ring, it builds a table of size M (a large prime, like 65,537) where each entry is a server.

Each server has a permutation of [0, M). Servers take turns filling the table at their preferred positions. The result: perfectly even distribution, O(1) lookup, and minimal disruption. Removing one server changes only ~1/N of the table entries.

The tradeoff is that the table must be rebuilt when membership changes. For Google’s use case (routing packets at line rate with rare membership changes), this is ideal.

Table size M = 7, servers A, B, C:

A's preferences: [3, 0, 4, 1, 5, 2, 6]
B's preferences: [0, 2, 4, 6, 1, 3, 5]
C's preferences: [1, 3, 5, 0, 2, 4, 6]

Round 1: A→3, B→0, C→1
Round 2: A→4, B→2, C→5
Round 3: A→6, B→(skip 4)→6(skip)→1(skip)→3(skip)→5(skip)... → backfill
Result:  [B, C, B, A, A, C, A]

when NOT to use consistent hashing

Consistent hashing solves key-to-server mapping with minimal disruption. But it’s not always the right tool:

Small, fixed clusters: if you have 3 database replicas that rarely change, modulo hashing is fine. The complexity of a ring isn’t worth it.
Ordered data: if you need range queries (all users with IDs 1000-2000), you want range-based partitioning, not hash-based. Hash rings destroy ordering.
Single-writer systems: if each key is written by exactly one node (like a leader-based system), consistent hashing adds complexity without benefit. Direct assignment is simpler.
Data locality matters: hash rings scatter related keys across the cluster. If your access pattern is “read all keys for user X,” co-locating them on one node (via range partitioning or explicit placement) is better.

The right question isn’t “should I use consistent hashing?” but “do I need stable key-to-node mapping that survives topology changes?” If yes, and your access pattern is point lookups, consistent hashing is the tool.