user@techdebt:~/blog$_
$ cd ..

Memory Models and Happens-Before

Core 0 L1 Cache (empty) (empty) Store Buffer (empty) Core 1 L1 Cache (empty) (empty) Store Buffer (empty) Main Memory x = 0 y = 0 Lock: FREE Thread 1 Thread 2
mode:
$ memory inspector
click Step or Play to inspect memory state
$ simulation.log

the broken flag

This is the most common anti-pattern in concurrent code. One thread sets a flag, another thread watches for it:

import threading

data = 0
ready = False

def writer():
    global data, ready
    data = 42
    ready = True

def reader():
    while not ready:
        pass
    print(data)  # might print 0!

t1 = threading.Thread(target=writer)
t2 = threading.Thread(target=reader)
t2.start()
t1.start()
t1.join()
t2.join()

The programmer’s intent is clear. Write data first, then set ready. The reader spins until ready is True, then prints data. Should always print 42.

It might print 0.

Two things can go wrong. First, the CPU has store buffers. When the writer sets data = 42, that value goes into Core 0’s store buffer, not directly to main memory. Core 1 (running the reader) might see the stale value. The store buffer for ready = True might flush before the one for data = 42, so the reader sees ready as True but data as 0.

Second, the CPU (and the compiler) can reorder instructions. data = 42 and ready = True don’t depend on each other from the CPU’s perspective. The processor is free to execute them in either order. If ready = True executes first, the reader thread wakes up and reads data before it’s been written.

Both problems have the same root cause: without explicit synchronization, one thread’s writes have no guaranteed order when observed by another thread.

why locks fix it

threading.Lock() does more than mutual exclusion. It also acts as a memory fence:

import threading

lock = threading.Lock()
data = 0

def writer():
    global data
    with lock:
        data = 42

def reader():
    with lock:
        print(data)  # always 42

t1 = threading.Thread(target=writer)
t2 = threading.Thread(target=reader)
t1.start()
t1.join()
t2.start()
t2.join()

When Thread 1 releases the lock, all of its pending writes are flushed to main memory. When Thread 2 acquires the lock, its CPU cache is invalidated, forcing fresh reads from main memory. Every write before the release is visible to every read after the next acquire.

This is the happens-before relationship. Lock release happens-before the next lock acquire of the same lock. It’s not just about preventing two threads from entering a critical section at the same time. It’s about making writes visible across cores.

happens-before rules

Python doesn’t have a formal memory model specification (yet), but the threading primitives provide these ordering guarantees in CPython:

  • Lock release happens-before the next acquire of the same lock. All writes before the release are visible after the acquire.
  • Thread start happens-before any operation in the started thread. If you set x = 10 before calling t.start(), the new thread sees x = 10.
  • Thread join happens-before the code after join() returns. If the thread wrote result = 42 before finishing, the parent thread sees it after join().
  • Queue.put() happens-before Queue.get() for the same item. Data passed through a Queue is always visible to the consumer.
  • Event.set() happens-before Event.wait() returns. Any writes before set() are visible to the thread that returns from wait().

If two operations are not connected by any of these rules, you have no visibility guarantee. The operations are concurrent, and either thread might see stale data. This is why the broken flag pattern fails: there’s no happens-before edge between the writer and the reader.