user@techdebt:~/blog$_
$ cd ..

Deadlock at the Dinner Table

Table P1 thinking P2 thinking P3 thinking P4 thinking F1 F2 F3 F4
mode:
$ deadlock inspector
click Step or Play to inspect philosopher state
$ simulation.log

the bug

Here are two threads that will never finish:

import threading
import time

lock_a = threading.Lock()
lock_b = threading.Lock()

def worker_1():
    with lock_a:
        time.sleep(0.1)  # simulate some work
        with lock_b:     # blocks forever
            print("worker 1 done")

def worker_2():
    with lock_b:
        time.sleep(0.1)
        with lock_a:     # blocks forever
            print("worker 2 done")

t1 = threading.Thread(target=worker_1)
t2 = threading.Thread(target=worker_2)
t1.start()
t2.start()
t1.join()  # hangs forever

Run this and it hangs. No error message, no crash, no stack trace. Just silence. Both threads are alive but neither can make progress. That’s a deadlock.

The time.sleep(0.1) makes it nearly deterministic. Without it, you might get lucky and have one thread finish before the other starts. But in production, “getting lucky” isn’t a concurrency strategy.

why it happens

In 1971, Edward Coffman Jr. identified four conditions that must all hold simultaneously for deadlock to occur. Map them onto the code above:

  1. Mutual exclusion. Only one thread can hold a Lock at a time. That’s the entire point of a lock.
  2. Hold and wait. worker_1 holds lock_a while waiting for lock_b. It won’t release what it has.
  3. No preemption. Python won’t forcibly take a lock from a thread. Once acquired, a lock stays held until the thread explicitly releases it.
  4. Circular wait. worker_1 wants lock_b (held by worker_2). worker_2 wants lock_a (held by worker_1). A cycle.

All four must be true simultaneously. Break any one and deadlock is impossible. That gives you four potential strategies. In practice, two of them actually work well.

the fix: lock ordering

def worker_1():
    with lock_a:        # always acquire a first
        with lock_b:
            print("worker 1 done")

def worker_2():
    with lock_a:        # same order as worker_1
        with lock_b:
            print("worker 2 done")

Both threads acquire lock_a first, then lock_b. No circular wait is possible. If worker_2 can’t get lock_a, it waits without holding any other lock. No hold-and-wait either. Two conditions broken at once.

This is the most common fix in practice because it’s simple to reason about and has no runtime overhead. Switch to “Resource Ordering” mode in the simulation to see this in action.

the fix: timeout

def worker_1():
    lock_a.acquire()
    if not lock_b.acquire(timeout=1):
        lock_a.release()  # back off
        return worker_1()  # retry
    try:
        print("worker 1 done")
    finally:
        lock_b.release()
        lock_a.release()

If a thread can’t get the second lock within a timeout, it releases the first lock and retries. This breaks the “hold and wait” condition. The thread refuses to hold a lock indefinitely while waiting for another.

The trade-off: retries add latency, and under high contention you can get livelock (both threads repeatedly backing off and retrying without making progress). But it’s a solid fallback when lock ordering is impractical. Switch to “Timeout” mode in the simulation to watch the retry behavior.