The Consistency | techdebt.blog

mode:

click Step or Play to inspect consistency state

the problem

You write x = 5 to a distributed database. A millisecond later, you read x. What do you get?

On a single machine, the answer is obvious: 5. But in a distributed system with replicas, the write might have reached one replica but not another. Your read might hit a stale replica and return the old value. Or it might return 5. Or, depending on the system, it might return a value that was never written at all.

Consistency models are the contract between the system and the programmer. They define exactly which values a read is allowed to return, and when.

linearizability

The strongest practical model. Every operation appears to take effect at a single, instantaneous point between its start and end. Once a write completes, every subsequent read must see it.

Think of it as a single global timeline where operations snap into place one at a time. Even though operations take real time (network round trips, disk writes), the system behaves as if each one happened at a single atomic instant.

Client A: |---W(x=1)---|
Client B:      |---R(x)---|  -> must return 1
                                (write completed before read started)

If a write finishes before a read starts, the read must see it. If they overlap, the read can return either the old or new value, but once any read returns the new value, all subsequent reads must also return it.

Linearizability is expensive. It typically requires coordination (consensus) on every operation. Spanner achieves it globally using GPS and atomic clocks (TrueTime). Most systems offer it only within a single partition or not at all.

sequential consistency

Slightly weaker. Operations from each individual process appear in program order, but there is no guarantee about real-time ordering across processes.

Client A: W(x=1)  then  R(x)    -> must see 1 (program order)
Client B: W(x=2)  then  R(x)    -> must see 2 (program order)
Client C: R(x)=2  then  R(x)=1  -> valid if global order is [B:W(2), A:W(1)]

Client C reading 2 then 1 looks wrong, but it is valid under sequential consistency. There exists a total ordering of all operations (B writes 2, then A writes 1) that is consistent with each process’s program order. C just sees the effects in global order.

But if C reads 1 then 2, it has committed to seeing A’s write before B’s. All of C’s future reads must be consistent with that ordering.

eventual consistency

The weakest useful model. The only guarantee: if no new writes happen, all replicas will eventually converge to the same value. During propagation, reads can return any previously written value.

Write x=1 to Replica 1
Read from Replica 2:  x=0  (stale, but allowed)
Read from Replica 3:  x=0  (stale, but allowed)
... time passes, replication propagates ...
Read from Replica 2:  x=1  (caught up)
Read from Replica 3:  x=1  (caught up)

Eventual consistency is fast. Writes return immediately after hitting one replica. Reads are local. No coordination, no waiting for quorum. The price: your application must handle stale data.

the spectrum

From strongest to weakest:

Linearizable: Real-time ordering. Gold standard. Highest latency.
Sequential: Program order per process. No real-time guarantees.
Causal: Operations that are causally related maintain their order. Concurrent operations can be seen in any order.
Eventual: Convergence only. No ordering guarantees during propagation.

Each step down trades correctness for performance. Linearizability requires coordination on every operation. Eventual consistency requires none. Most real systems land somewhere in between, offering tunable consistency or different guarantees for different operations.

where it shows up

Spanner: Linearizable reads and writes globally, using TrueTime (GPS + atomic clocks) to bound clock uncertainty. The gold standard for strong consistency at global scale.
PostgreSQL: Serializable isolation (close to linearizable) within a single node. Streaming replication to replicas is asynchronous by default (eventual for reads from replicas).
Cassandra: Tunable consistency. QUORUM reads and writes give you something close to linearizable. ONE gives you eventual consistency with lower latency.
DynamoDB: Eventually consistent reads by default. Strongly consistent reads available at 2x the cost (reads from the leader replica).
Redis: Single-threaded, so linearizable on a single node. Replication is asynchronous, so reads from replicas are eventually consistent.

mode:

click Step or Play to inspect consistency state

+ causal consistency

Causal consistency sits between sequential and eventual. It preserves the order of causally related operations while allowing concurrent (unrelated) operations to be seen in any order.

Two operations are causally related if:

They are from the same process (program order)
One reads a value written by the other
There is a transitive chain of such dependencies

A: W(x=1)
B: R(x)=1  then  W(y=2)    // B's write to y is causally after A's write to x
C: R(y)=2  -> must see x=1  // if C sees y=2, it must also see x=1

Causal consistency can be implemented without global coordination. Each operation carries a vector clock or dependency list. Replicas delay delivering an operation until all its causal dependencies are satisfied.

COPS (Clusters of Order-Preserving Servers) and MongoDB’s causal sessions implement variants of causal consistency. It provides stronger guarantees than eventual consistency while remaining scalable across datacenters.

+ read-your-writes, monotonic reads, and session guarantees

Between eventual and causal, there are useful session guarantees that most applications need:

Read-your-writes: After you write a value, your subsequent reads see that write (or a later one). Without this, you update your profile name and the page still shows the old name. Infuriating.

Monotonic reads: Once you read a value, you never see an older value in subsequent reads. Without this, you refresh the page and comments disappear, then reappear. Confusing.

Monotonic writes: Your writes are applied in the order you issued them. Without this, you set x=1 then x=2, but another reader sees x=2 then x=1.

Writes-follow-reads: If you read a value and then write based on it, your write is ordered after the read. This prevents “replying to a message that hasn’t arrived yet” anomalies.

Most databases offer these guarantees within a session (a single client connection routed to a single replica). DynamoDB’s strongly consistent reads, PostgreSQL’s synchronous replication, and MongoDB’s causal sessions all provide some combination of these properties.

+ linearizability vs serializability

These terms sound similar but describe different things.

Linearizability is a property of single operations on single objects. It says: every read/write on register X appears atomic and in real-time order.

Serializability is a property of transactions (groups of operations on multiple objects). It says: the result of executing transactions concurrently is equivalent to some serial (one-at-a-time) execution.

You can have one without the other:

A system can be serializable but not linearizable (transactions execute in some serial order, but that order might not match real-time)
A system can be linearizable but not serializable (single operations are atomic, but multi-object transactions are not)

Strict serializability (also called “linearizable + serializable”) gives you both: transactions appear atomic and in real-time order. Spanner, CockroachDB, and FoundationDB provide strict serializability. It is the strongest isolation level and the most expensive.

+ CAP theorem: the real story

The CAP theorem (Brewer, 2000) states that a distributed system can provide at most two of three guarantees: Consistency (linearizability), Availability (every request gets a response), and Partition tolerance (the system works despite network partitions).

Since network partitions are inevitable in distributed systems, the real choice is between C and A during a partition:

CP systems (etcd, ZooKeeper, Spanner): Reject requests when they cannot guarantee consistency. Prefer correctness over availability.
AP systems (Cassandra, DynamoDB in eventual mode, Riak): Accept requests even during partitions, at the risk of returning stale data. Prefer availability over consistency.

The nuance: CAP is about behavior during partitions only. When the network is healthy, you can have both consistency and availability. Most systems are consistent and available most of the time; they only diverge during the (rare) partition events.

Also, CAP’s definition of “availability” is binary: every non-failing node must respond. In practice, systems make more nuanced tradeoffs. A system might be unavailable for writes but available for reads, or available with degraded consistency. PACELC extends CAP: during Partitions choose A or C; Else (normal operation) choose Latency or Consistency.

+ tunable consistency in practice

Many modern databases let you choose your consistency level per operation.

Cassandra uses quorum-based consistency:

ONE: Write/read from one replica. Fastest, weakest.
QUORUM: Majority of replicas. Good balance.
ALL: Every replica. Strongest, slowest, least available.
LOCAL_QUORUM: Majority within the local datacenter. Good for multi-DC deployments.

If W + R > N (write quorum + read quorum > total replicas), reads are guaranteed to see the latest write. Common setup: N=3, W=2, R=2. This gives you something close to linearizable at the cost of higher latency than ONE.

DynamoDB offers two modes:

Eventually consistent reads (default): May return stale data. Half the cost.
Strongly consistent reads: Always returns the latest write. Reads from the leader. 2x the read cost.

MongoDB offers:

w:1 (acknowledged by primary only) to w:majority (acknowledged by majority of replicas)
Read concern: local, available, majority, linearizable, snapshot

The pattern: stronger consistency costs more latency and throughput. Applications that need both strong and weak consistency often use strong for critical paths (payments, inventory) and eventual for everything else (recommendations, analytics).

mode:

click Step or Play to inspect consistency state

production stories

Spanner and TrueTime

Google Spanner provides linearizable reads and writes across the globe. The secret: TrueTime, a clock API that returns an interval [earliest, latest] bounding the true current time. GPS receivers and atomic clocks in every datacenter keep this interval tight (usually under 7 milliseconds).

When Spanner commits a transaction, it picks a timestamp and then waits until it is certain that timestamp is in the past everywhere. This “commit wait” is bounded by the TrueTime uncertainty. With tight clocks, the wait is just a few milliseconds.

Without TrueTime, achieving the same guarantee requires round trips to a centralized timestamp oracle (like CockroachDB’s hybrid logical clocks with a bounded clock skew assumption). Spanner’s hardware investment in clocks buys it lower latency for globally consistent transactions.

DynamoDB’s consistency options

DynamoDB defaults to eventually consistent reads. When you read an item, you might get a version from a few hundred milliseconds ago. For most applications, this is fine. Product catalogs, user preferences, session data can all tolerate brief staleness.

Strongly consistent reads hit the leader replica directly. They cost twice as much and have higher latency, but you always get the latest write. DynamoDB automatically routes these to the correct node.

The operational insight: most DynamoDB tables use eventual consistency for 90%+ of reads. Only the critical paths (checking inventory before purchase, reading a distributed lock) use strong consistency. This hybrid approach gives both performance and correctness where it matters.

MongoDB’s causal sessions

MongoDB 3.6 introduced causal consistency through client sessions. Within a session, MongoDB guarantees read-your-writes, monotonic reads, monotonic writes, and writes-follow-reads.

The implementation uses cluster time, a logical clock that advances with each operation. Each operation carries the cluster time of its causal dependencies. Secondaries (replicas) delay returning results until they have caught up to the required cluster time.

Without causal sessions, a common MongoDB anti-pattern: write to the primary, immediately read from a secondary, get stale data. Causal sessions fix this by routing the read to a secondary that has replicated far enough.

when eventual is enough

Many workloads do not need strong consistency, and the performance gains from eventual consistency are substantial:

Social media feeds: Seeing a post a few seconds late is acceptable. Showing it to everyone simultaneously is not worth the coordination cost.
Product catalogs: A price update propagating in under a second is fine. Blocking all reads until every replica has the update is not.
Caching layers: Redis replication is asynchronous. Stale cache hits are the norm, and applications handle them with TTLs and invalidation.
DNS: The entire internet runs on eventual consistency. TTLs propagate changes over minutes to hours. It works because exact real-time accuracy is not needed for name resolution.

The rule of thumb: if incorrect data causes financial loss or safety issues, use strong consistency. If it causes a slightly outdated UI, eventual is probably fine.