One Writer, Many Readers
Multiple threads can read shared data at the same time without conflict. But when one thread needs to write, it must have exclusive access. A read-write lock allows concurrent reads while serializing writes. Multiple readers hold the lock simultaneously, but a writer waits for all readers to finish, then blocks everyone else until it’s done. The tricky part is fairness: when both readers and writers are waiting, who goes next? Get it wrong and one side starves.
the pattern
A regular threading.Lock() forces one thread at a time. That’s fine when
every operation modifies shared state. But what about a config cache that
hundreds of threads read and one thread updates every few minutes? With a
plain lock, every reader blocks every other reader. That’s a bottleneck
for no reason. Readers don’t conflict with each other. Only writes need
exclusion.
A read-write lock captures this insight:
- Read lock (shared). Multiple threads can hold it at once.
- Write lock (exclusive). Only one thread can hold it. No readers and no other writers allowed.
import threading
class ConfigCache:
def __init__(self):
self.config = {"model": "v1", "timeout": 30}
self.rwlock = RWLock()
def read_config(self, key):
with self.rwlock.read_lock():
return self.config.get(key)
def update_config(self, key, value):
with self.rwlock.write_lock():
self.config[key] = valueTen threads calling read_config run in parallel. When one thread calls
update_config, it waits for current readers to finish, then gets
exclusive access.
building a read-write lock
Python’s standard library doesn’t include a read-write lock. Here’s one
built with threading.Condition:
import threading
from contextlib import contextmanager
class RWLock:
def __init__(self):
self._cond = threading.Condition(threading.Lock())
self._readers = 0
self._writer = False
@contextmanager
def read_lock(self):
with self._cond:
while self._writer:
self._cond.wait()
self._readers += 1
try:
yield
finally:
with self._cond:
self._readers -= 1
if self._readers == 0:
self._cond.notify_all()
@contextmanager
def write_lock(self):
with self._cond:
while self._writer or self._readers > 0:
self._cond.wait()
self._writer = True
try:
yield
finally:
with self._cond:
self._writer = False
self._cond.notify_all()The writer waits until there are zero readers and no other writer. Readers wait only if a writer is active.
the fairness problem
The lock above has a subtle bug. Not a correctness bug. A starvation bug.
Imagine a steady stream of readers. Reader 1 holds the lock. Reader 2 arrives and also gets in. Before Reader 1 finishes, Reader 3 arrives. The reader count never hits zero. A waiting writer never gets its turn.
Three fairness policies:
Reader preference. That’s what we built above. Writers can starve if readers are continuous.
Writer preference. When a writer is waiting, new readers queue behind it. Prevents writer starvation but can starve readers.
class WriterPreferRWLock:
def __init__(self):
self._cond = threading.Condition(threading.Lock())
self._readers = 0
self._writer = False
self._writers_waiting = 0
@contextmanager
def read_lock(self):
with self._cond:
while self._writer or self._writers_waiting > 0:
self._cond.wait()
self._readers += 1
try:
yield
finally:
with self._cond:
self._readers -= 1
if self._readers == 0:
self._cond.notify_all()
@contextmanager
def write_lock(self):
with self._cond:
self._writers_waiting += 1
while self._writer or self._readers > 0:
self._cond.wait()
self._writers_waiting -= 1
self._writer = True
try:
yield
finally:
with self._cond:
self._writer = False
self._cond.notify_all()FIFO (fair). Threads get access in the order they requested it. No starvation, but less concurrency. Readers arriving after a waiting writer must wait even though they could safely run concurrently.
Try the different fairness modes in the simulation above.
the pattern
A regular lock is too coarse for read-heavy workloads. Readers don’t conflict with each other. A read-write lock allows concurrent reads while giving writers exclusive access. The fairness question (reader preference, writer preference, or FIFO) determines who goes next when both sides are waiting. There’s no universally correct choice.
+ model weight serving
This pattern shows up in production ML systems. An inference server has many reader threads handling prediction requests, all reading the same model weights. Periodically, a single writer thread swaps in new weights.
During the swap, no reader can be partway through an inference using half-old, half-new weights. The writer needs exclusive access. So the server grabs a read lock for each inference and a write lock for updates.
Writer preference makes sense here. When a new model is ready, you want it deployed quickly. Letting inference requests delay the swap means serving stale predictions. In practice, inference requests are fast (milliseconds), so the writer won’t wait long. But the preference ensures it doesn’t get indefinitely delayed by continuous requests.
Some systems avoid the lock entirely by keeping two copies of the weights. The writer updates the inactive copy, then atomically swaps a pointer. Readers on the old copy finish naturally, new readers pick up the new copy. This is a read-copy-update pattern. More memory, less contention.
+ database MVCC
Databases face the read-write problem at massive scale. Hundreds of concurrent queries reading while transactions write.
Postgres solves this with Multi-Version Concurrency Control (MVCC). Instead of locking rows, every write creates a new version of the row. Readers see a snapshot from when their transaction started. Writers create new versions without touching old ones.
Readers never block writers. Writers never block readers. A SELECT
running a complex report doesn’t slow down concurrent INSERT
operations.
The cost is garbage collection. Old row versions accumulate and Postgres
must VACUUM them away. Long-running transactions hold back cleanup
because their snapshot still references old versions. A forgotten
BEGIN with no COMMIT can cause table bloat that degrades performance
for everyone.
MVCC is the read-write lock idea pushed to its logical extreme. Instead of blocking, you give each reader its own version of the truth.
+ the upgrade deadlock
A thread holds a read lock and realizes it needs to write. Can it upgrade to a write lock without releasing the read lock first?
No. A write lock requires zero readers. But the thread trying to upgrade IS a reader. It waits for all readers to release. It can’t release its own read lock because it’s waiting. Deadlock.
With two threads it’s worse. Thread A holds a read lock and wants to upgrade. Thread B also holds a read lock and wants to upgrade. Each waits for the other to release. Neither can.
The safe approach: release the read lock, acquire the write lock, re-validate your assumptions. The data may have changed between the release and acquire. You read some state, made a decision, but the state might have changed before your write lands.
Some implementations offer tryUpgrade() that succeeds only if this
thread is the sole reader. If another reader exists, it fails
immediately instead of deadlocking. Safer, but you still need a
fallback path.
The simulation shows you the basic mechanics. Readers share, writers exclude. In practice, read-write locks are a middle point on a spectrum that stretches from “lock everything” to “lock nothing at all.” What follows is the far end of that spectrum.
RCU: read-copy-update
The Linux kernel faces an extreme version of this problem. Routing tables and firewall rules are read millions of times per second. Writes happen rarely. Even a read-write lock is too expensive. Acquiring a shared lock means modifying the lock’s state (incrementing a reader count), which forces a cache line to bounce between CPU cores. On a 128-core server, that bouncing kills performance.
Read-Copy-Update eliminates the read-side cost entirely. Readers don’t acquire any lock. They don’t modify any shared state. They just read the pointer to the current data structure and use it.
Writers do the hard work. They copy the data structure, modify the copy, then atomically swap the pointer. Old readers still reference the old version and finish undisturbed. The writer waits for all pre-existing readers to finish (a “grace period”), then frees the old version. In the kernel, a grace period is detected by tracking context switches. If every CPU has context-switched since the swap, no thread can still hold a reference to the old data.
For workloads where reads outnumber writes by 1000:1 or more, this is worth it. The read path has zero overhead. No atomic operations, no cache line bouncing, no memory barriers on most architectures.
seqlocks: when the data is small
For simple values like a timestamp or a pair of coordinates, there’s an even lighter mechanism. A seqlock uses a sequence counter. Writers increment it before and after the update (odd during the write, even when done). Readers check the counter before and after reading. If it changed or is odd, they retry.
class SeqLock:
def __init__(self):
self._seq = 0
self._lock = threading.Lock()
self.x = 0
self.y = 0
def write(self, x, y):
with self._lock:
self._seq += 1 # now odd, write in progress
self.x = x
self.y = y
self._seq += 1 # now even, write complete
def read(self):
while True:
seq1 = self._seq
if seq1 % 2 == 1:
continue # write in progress, retry
x, y = self.x, self.y
if self._seq == seq1:
return x, y # consistent readReaders never block and never modify shared state. The limitation: readers can see garbage during a write. For pointers, that’s dangerous. Seqlocks only work when the data can be safely read even if temporarily inconsistent, because the reader detects it and discards the result.
the tradeoff space
Every concurrency primitive sits somewhere on a spectrum between consistency and throughput.
A mutex is the simplest. Full consistency, lowest concurrency. One thread at a time. A read-write lock relaxes this for readers. More concurrency for read-heavy workloads, more complexity in fairness policy. MVCC relaxes further, letting readers and writers operate on different versions. The cost shifts to garbage collection. RCU pushes the extreme with zero reader cost, all complexity on the write path. Seqlocks take yet another path where readers do speculative work and discard it if a write interfered.
The pattern is consistent: the more you optimize for reads, the more complex writes become. There’s no trick that makes both sides cheaper. You’re moving cost from one side to the other, choosing where to pay based on your workload. A config cache read by thousands of threads and updated once a minute has a very different optimal point than a collaborative document edited by dozens of users simultaneously.
The right question isn’t “which lock should I use?” It’s “how does my workload split between reads and writes, and how much complexity am I willing to put on the write path to make reads faster?”