async/await Under the Hood
async/await is concurrency on a single thread. Coroutines yield control
at await points. The event loop runs other coroutines while one waits
for I/O. No threads, no locks, no race conditions on shared state. But
block the event loop with CPU work and everything stalls.
the event loop
At the center of asyncio is a loop. An actual while True that does two
things: check which I/O operations are ready, then run the coroutines
waiting on them. One thread, checking for ready work, executing it, repeat.
When a coroutine hits an await, it tells the loop “I’m waiting for
something, go run other stuff and come back to me when it’s ready.”
import asyncio
async def say_hello():
print("hello")
await asyncio.sleep(1) # yields control back to the loop
print("world")
asyncio.run(say_hello())asyncio.run() creates an event loop, runs the coroutine, then shuts
down. async def defines a coroutine function. await suspends the
coroutine and gives control back to the loop. Between await points,
your code runs uninterrupted. This is cooperative scheduling.
concurrent I/O
The real power shows when you run multiple I/O operations at once:
async def main():
async with aiohttp.ClientSession() as session:
results = await asyncio.gather(
fetch(session, "https://api.example.com/users"),
fetch(session, "https://api.example.com/posts"),
fetch(session, "https://api.example.com/comments"),
)
print(f"Got {len(results)} responses")asyncio.gather() runs all three concurrently. When fetch hits an
await for the first URL, the loop starts the second request. Then the
third. All three are in flight at the same time, on a single thread. Three
requests that each take 200ms finish in roughly 200ms total, not 600ms.
the blocking mistake
This is where people get burned:
async def bad_sleep():
print("starting")
time.sleep(3) # blocks the ENTIRE event loop
print("done")
async def ticker():
for i in range(5):
print(f"tick {i}")
await asyncio.sleep(1)
await asyncio.gather(bad_sleep(), ticker())You’d expect ticks while bad_sleep waits. They don’t happen.
time.sleep(3) freezes the thread. Since the event loop runs on that
thread, everything stops. The same applies to requests.get(), heavy
file I/O, and CPU-bound computation. If it doesn’t await, it blocks.
The fix: use asyncio.sleep() instead. It yields control to the loop.
For blocking libraries, use run_in_executor():
async def fetch_sync(url):
loop = asyncio.get_event_loop()
response = await loop.run_in_executor(
None, # default ThreadPoolExecutor
requests.get, url,
)
return response.textThe blocking call runs in a separate thread. The event loop stays free. Use the simulation above to see these patterns in action.
the event loop
A while True that checks for ready I/O and runs callbacks.
asyncio.run() creates one, runs your coroutine, shuts it down. await
yields control back to the loop. Between await points, your code runs
uninterrupted.
async def say_hello():
print("hello")
await asyncio.sleep(1)
print("world")
asyncio.run(say_hello())concurrent I/O
asyncio.gather() runs multiple coroutines concurrently. Three 200ms
requests finish in roughly 200ms total, not 600ms.
the blocking mistake
time.sleep(), requests.get(), and CPU work all block the event loop.
Use async equivalents or run_in_executor() for blocking calls.
+ coroutines are generators
Python’s coroutines evolved from generators over three PEPs and a decade.
PEP 342 (2005) added .send() and .throw() to generators. You
could now send values in, not just yield them out.
def old_style_coroutine():
value = yield "ready"
print(f"received: {value}")
g = old_style_coroutine()
print(next(g)) # "ready"
g.send("hello") # "received: hello"PEP 380 (2012) added yield from for delegating to sub-generators.
This enabled composing coroutines without manually forwarding values.
PEP 492 (2015) introduced async def and await as dedicated
syntax. await is essentially yield from with type checking. Under
the hood, when you await, the coroutine suspends and returns control
to the event loop, just like yield returns control to the caller. The
loop drives coroutines by calling .send(None) when their I/O is ready.
+ when async is wrong
CPU-bound work. If your function crunches numbers for 500ms without
an await, it blocks the loop for 500ms. Use run_in_executor() with
a ProcessPoolExecutor:
from concurrent.futures import ProcessPoolExecutor
pool = ProcessPoolExecutor(max_workers=4)
async def good_compute():
loop = asyncio.get_event_loop()
return await loop.run_in_executor(pool, cpu_heavy_fn)Blocking libraries. requests, psycopg2, sqlite3. Wrapping
them in async def doesn’t make them async. Use aiohttp, asyncpg,
aiosqlite instead.
Mixing sync and async. Calling asyncio.run() inside a running
loop raises RuntimeError. The sync/async boundary is painful.
run_in_executor() helps but adds complexity.
Simple scripts. If you’re making three sequential API calls,
synchronous requests is simpler. Use async when you actually need
concurrent I/O.
+ structured concurrency
asyncio.gather() has a problem. If one task raises, others keep running
in the background. You might not know they exist.
Python 3.11 introduced TaskGroup:
async with asyncio.TaskGroup() as tg:
tg.create_task(fetch(url_1))
tg.create_task(fetch(url_2))
tg.create_task(fetch(url_3))
# all tasks guaranteed done hereIf any task raises, TaskGroup cancels all others and waits for them
to finish. Exceptions propagate as an ExceptionGroup (PEP 654):
try:
async with asyncio.TaskGroup() as tg:
tg.create_task(might_fail())
tg.create_task(slow_task())
except* ValueError as eg:
for exc in eg.exceptions:
print(f"caught: {exc}")
# slow_task was cancelled, not left danglingThe principle: concurrent tasks should have the same lifetime guarantees as function calls. When a task group exits, all its tasks are done.
The simulation shows coroutines bouncing between running and waiting. But what makes the event loop fast? The answer is in the OS kernel.
I/O multiplexing
When the event loop checks “which I/O operations are ready?”, it asks the kernel. The mechanism has evolved over decades.
select() (1983). Give the kernel a list of file descriptors, it tells
you which are ready. Problem: linear scan, capped at 1024 descriptors.
poll() (1986). Removes the 1024 limit, keeps the O(n) scan.
epoll (Linux, 2002) / kqueue (BSD/macOS, 2000). Register interest
once, get notified only about ready descriptors. O(1) per ready event.
This is what makes tens of thousands of connections practical.
asyncio picks the best mechanism per platform automatically. The loop’s
inner cycle looks roughly like this:
while True:
ready = epoll.poll(timeout=next_timer_deadline)
for fd, event in ready:
registered_callbacks[fd]()
while ready_queue:
ready_queue.popleft().send(None) # resume coroutineEvery await on a network operation registers a file descriptor with
epoll/kqueue and suspends the coroutine. When data arrives, the kernel
marks it ready, the loop resumes the coroutine.
the C10K problem
In 1999, Dan Kegel asked: how do you handle 10,000 simultaneous connections on one server? The standard approach was one thread per connection. 10,000 threads meant MB-scale stack memory each and brutal context switching.
The answer: event-driven I/O. One thread with epoll/kqueue monitors all
connections. nginx (2004), Node.js (2009), and Python’s asyncio (2014,
PEP 3156) all took this path. Modern servers handle C100K or C1M using the
same principles. io_uring on Linux pushes further by batching I/O
submissions through shared ring buffers.
different languages, different bets
Node.js (libuv). JavaScript was single-threaded from birth. Node leaned
in with libuv wrapping epoll/kqueue/IOCP. Everything is async by
default. You can’t accidentally block with a standard library call because
blocking APIs barely exist. The cost: callback hell (before async/await),
CPU work needs worker_threads, one unhandled exception crashes the process.
Go (goroutines). Go hides the event loop entirely. You write
synchronous-looking code and the runtime multiplexes goroutines onto OS
threads. No async, no await, no colored functions. Goroutines are cheap
(few KB of stack), so you spawn millions. The tradeoff: you don’t control
scheduling, and debugging requires understanding the M:N scheduler.
Python (threads + GIL, then asyncio). Python started with OS threads
and the GIL. Threads work for I/O (GIL releases during system calls) but
they’re heavy and the GIL prevents CPU parallelism. asyncio added an
event loop in 3.4, but it’s opt-in. Python now has two concurrency worlds
that don’t mix cleanly.
The tension is between explicitness and convenience. Python’s await makes
suspension points visible. Go hides suspension, which is convenient but
means any function call might yield. Node made everything a callback (then
added async/await). Each model trades visibility for convenience in
different places.