Actors and Message Passing
Instead of threads sharing memory and coordinating with locks, actors are independent units that communicate by sending messages. Each actor has private state that only it can modify. No shared memory means no locks, no races, no deadlocks.
the idea
Every concurrency bug we’ve looked at so far has the same root cause: shared mutable state. Two threads touch the same memory, and things go wrong. Locks fix it, but locks bring deadlocks, priority inversion, and code that’s hard to reason about.
The actor model takes a different approach. Instead of sharing memory and coordinating access, actors are independent units that communicate by sending messages. Each actor has four things:
- An address, so others can send to it.
- A mailbox, a queue of incoming messages.
- Private state, that no one else can touch.
- Behavior, how it processes each message.
When an actor processes a message, it can do three things: send messages to other actors, create new actors, or change its own state for the next message. That’s it. No reaching into another actor’s state. No shared variables. No locks.
a python actor
Python doesn’t have actors built in, but you can build a minimal actor system with threads and queues:
import threading
from queue import Queue
class Actor:
def __init__(self):
self.mailbox = Queue()
self.state = {}
self._running = True
self._thread = threading.Thread(target=self._loop, daemon=True)
self._thread.start()
def _loop(self):
while self._running:
message = self.mailbox.get()
if message is None:
break
self.handle(message)
def handle(self, message):
raise NotImplementedError
def send(self, message):
self.mailbox.put(message)
def stop(self):
self._running = False
self.mailbox.put(None)Each actor gets its own thread and its own mailbox. The _loop method
pulls one message at a time and calls handle. Subclasses define what
handle does. send drops a message into the mailbox. stop sends a
poison pill (None) to shut the loop down.
Here’s a counter built on top of it:
class CounterActor(Actor):
def __init__(self):
super().__init__()
self.state = {"count": 0}
def handle(self, message):
if message["type"] == "add":
self.state["count"] += message["value"]
elif message["type"] == "get":
message["reply_to"].send({
"type": "count",
"value": self.state["count"]
})No locks anywhere. The actor’s _loop processes one message at a time on
a single thread. State is private. Only the actor’s own thread ever reads
or writes self.state. This is thread-safe by design, not by discipline.
Send a thousand add messages from ten different threads. Every single one
is processed sequentially through the mailbox. No lost updates, no race
conditions.
counter = CounterActor()
def spam_adds():
for _ in range(1000):
counter.send({"type": "add", "value": 1})
threads = [threading.Thread(target=spam_adds) for _ in range(10)]
for t in threads:
t.start()
for t in threads:
t.join()
import time
time.sleep(1) # let the actor drain its mailbox
class Printer(Actor):
def handle(self, message):
print(f"count: {message['value']}") # always 10000
printer = Printer()
counter.send({"type": "get", "reply_to": printer})when actors don’t fit
Actors aren’t free. Message passing is slower than a direct function call. Every message goes through a queue, gets dispatched, and the reply comes back through another queue. For tight computational loops where you need nanosecond-level performance, this overhead matters.
Debugging is harder too. Instead of a stack trace, you’re tracing messages across actors. When something goes wrong, you need to reconstruct the sequence of messages that led to the bad state. Logging and tracing become essential.
Sometimes a simple lock is the right answer. If two threads need to bump a counter a few times during startup, pulling in an actor framework is overkill. Actors shine when you have many independent entities with their own state that need to coordinate over time.
the idea
Actors are independent units that communicate by sending messages. Each actor has an address, a mailbox (queue), private state, and behavior for processing messages. No shared memory, no locks. When an actor handles a message, it can send messages, create new actors, or update its own state.
a python actor
import threading
from queue import Queue
class Actor:
def __init__(self):
self.mailbox = Queue()
self.state = {}
self._running = True
self._thread = threading.Thread(target=self._loop, daemon=True)
self._thread.start()
def _loop(self):
while self._running:
message = self.mailbox.get()
if message is None:
break
self.handle(message)
def handle(self, message):
raise NotImplementedError
def send(self, message):
self.mailbox.put(message)
def stop(self):
self._running = False
self.mailbox.put(None)One thread per actor, one message at a time. State is private. Thread safety comes from the design, not from locks.
+ Ray actors
Ray turns the actor model into a distributed computing framework. The
@ray.remote decorator turns a regular Python class into an actor that
can run on any machine in a cluster. Method calls become messages. Return
values become futures.
import ray
@ray.remote
class ModelServer:
def __init__(self, model_path):
self.model = load_model(model_path)
def predict(self, input_data):
return self.model(input_data)
server = ModelServer.remote("model.pt")
future = server.predict.remote(data)
result = ray.get(future)ModelServer.remote() creates the actor, possibly on a different machine.
server.predict.remote() sends a message and returns a future immediately.
ray.get() blocks until the result is ready. You never touch the actor’s
state directly. Ray handles serialization, scheduling, and fault tolerance.
The same pattern scales from a laptop to a thousand-node cluster. Ray
actors are used heavily in ML pipelines for model serving, distributed
training, and reinforcement learning. Each actor processes requests
sequentially (just like our Actor class), so there are no race
conditions on model weights or internal state.
+ Erlang's 'let it crash'
Erlang took the actor model further than anyone. In Erlang/OTP, actors (called “processes”) are expected to crash. Not as a failure mode, but as a design principle.
Instead of writing defensive code with try/catch everywhere, you let actors crash and recover. Supervisors watch their children and restart them with clean state. The supervision strategies determine what happens when a child dies:
- one-for-one: restart just the crashed child. Other children are unaffected.
- one-for-all: restart all children. Use this when children depend on each other.
- rest-for-one: restart the crashed child and every child started after it. Use this when children have sequential dependencies.
The key insight: recovery is easier than prevention. Writing code that handles every possible error is hard and makes the code brittle. Restarting an actor with clean state is simple and gives you fault isolation for free. If one actor corrupts its state, it crashes, the supervisor restarts it, and the rest of the system never notices.
Erlang built telephone switches that run for years with 99.9999999% uptime (nine nines). The actor model and supervision trees are a big part of why.
+ actors vs CSP
Go chose a different model: CSP (Communicating Sequential Processes). Both avoid shared mutable state, but they work differently.
In the actor model, each actor has an identity (address) and a mailbox. Messages go directly to a specific actor. Communication is asynchronous by default: the sender drops the message in the mailbox and moves on.
In CSP, processes communicate through named channels. Both sender and receiver must be ready at the same time (synchronous by default). Channels are typed and directional. A process doesn’t send to another process; it sends to a channel that another process reads from.
# Actor model: send directly to an actor
counter.send({"type": "add", "value": 1})
# CSP style (if Python had Go-like channels):
# channel.send({"type": "add", "value": 1})
# The sender doesn't know (or care) who reads from the channelCSP is more structured. Channels enforce a communication topology at compile time. Actors are more flexible. Any actor can send to any other actor if it has the address.
Go chose CSP. Erlang chose actors. Both work. The choice often comes down to whether you want the structure of channels or the flexibility of direct addressing.
The simulation shows actors passing messages in a controlled environment. Clean, predictable, sequential. What follows is what happens when actors leave a single machine and spread across a network.
the CAP theorem through actors
Actors naturally model distributed systems. Each actor is an independent node with its own state. Messages between actors are like network messages. On a single machine, the mailbox is a thread-safe queue. Across machines, the mailbox is a network socket. The programming model doesn’t change, but the failure modes do.
If actor A sends a message to actor B and the network partitions, you have a choice. You can block and wait for the partition to heal (sacrificing availability). Or you can proceed without B’s acknowledgment (sacrificing consistency). This is the CAP theorem playing out at the actor level.
Most actor systems default to availability. Messages are fire-and-forget. The sender doesn’t block waiting for a response unless it explicitly asks for one. This means actors can keep processing even when parts of the system are unreachable. But it also means state can diverge. Actor A thinks the count is 10. Actor B thinks it’s 12. Neither is wrong from its own perspective.
CRDTs (conflict-free replicated data types) offer a middle path. These are data structures designed so that replicas can be updated independently and merged without coordination. A CRDT counter doesn’t store a single number. Each actor stores its own increment count, and the total is the sum across all actors. Merging is just addition, which is commutative and associative. The order of messages doesn’t matter. The result converges regardless.
class CRDTCounter:
def __init__(self, actor_id):
self.actor_id = actor_id
self.counts = {}
def increment(self):
self.counts[self.actor_id] = self.counts.get(self.actor_id, 0) + 1
def merge(self, other_counts):
for actor_id, count in other_counts.items():
self.counts[actor_id] = max(
self.counts.get(actor_id, 0),
count
)
def value(self):
return sum(self.counts.values())Each actor increments its own entry. Merging takes the max per actor. No coordination needed. No lost updates. Counters, sets, and maps can all be built this way. The tradeoff: CRDTs only work for data structures where operations commute. Not everything fits.
location transparency
A powerful property of actor systems is location transparency. The code that sends a message doesn’t need to know whether the receiving actor is in the same process, on another machine, or in another datacenter. Akka, Orleans, and Ray all provide this. An actor reference looks the same regardless of where the actor lives.
# These could be local or remote. The caller doesn't know.
inventory.send({"type": "reserve", "item": "widget", "qty": 5})
payment.send({"type": "charge", "amount": 49.99, "currency": "USD"})
shipping.send({"type": "ship", "address": customer_address})The implication is powerful: you can develop locally with all actors in one process, then deploy to a distributed cluster without changing your application logic. Actor frameworks handle serialization, routing, and discovery. Your code just sends messages.
But location transparency has a cost. When every message send looks the same, you lose the ability to reason about latency. A “local” call that takes microseconds looks identical in code to a cross-datacenter call that takes 50 milliseconds. You can’t tell from reading the code whether a sequence of three message sends takes 3 microseconds or 150 milliseconds.
This is the fundamental tension in distributed actor systems. The abstraction makes the code clean and portable. But networks are not reliable, not fast, and not uniform in latency. Pretending otherwise leads to systems that work perfectly on a developer’s laptop and fall apart in production.
The pragmatic approach: use location transparency for the programming model, but never forget the network underneath. Instrument message latencies. Set timeouts. Design for partial failure. The actor model gives you a clean abstraction for concurrency. It doesn’t give you a clean abstraction for physics. Messages take time. Networks partition. Actors crash. The best actor systems (Erlang/OTP, Akka, Orleans) bake this reality into their design with supervision trees, delivery guarantees, and failure detectors. The model is simple. Deploying it well is not.