async/await Under the Hood

async/await is concurrency on a single thread. Coroutines yield control at await points. The event loop runs other coroutines while one waits for I/O. No threads, no locks, no race conditions on shared state. But block the event loop with CPU work and everything stalls.

mode:

Step through to inspect state

mode:

Step through to inspect state

the event loop

At the center of asyncio is a loop. An actual while True that does two things: check which I/O operations are ready, then run the coroutines waiting on them. One thread, checking for ready work, executing it, repeat.

When a coroutine hits an await, it tells the loop “I’m waiting for something, go run other stuff and come back to me when it’s ready.”

import asyncio

async def say_hello():
    print("hello")
    await asyncio.sleep(1)  # yields control back to the loop
    print("world")

asyncio.run(say_hello())

asyncio.run() creates an event loop, runs the coroutine, then shuts down. async def defines a coroutine function. await suspends the coroutine and gives control back to the loop. Between await points, your code runs uninterrupted. This is cooperative scheduling.

concurrent I/O

The real power shows when you run multiple I/O operations at once:

async def main():
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(
            fetch(session, "https://api.example.com/users"),
            fetch(session, "https://api.example.com/posts"),
            fetch(session, "https://api.example.com/comments"),
        )
        print(f"Got {len(results)} responses")

asyncio.gather() runs all three concurrently. When fetch hits an await for the first URL, the loop starts the second request. Then the third. All three are in flight at the same time, on a single thread. Three requests that each take 200ms finish in roughly 200ms total, not 600ms.

the blocking mistake

This is where people get burned:

async def bad_sleep():
    print("starting")
    time.sleep(3)  # blocks the ENTIRE event loop
    print("done")

async def ticker():
    for i in range(5):
        print(f"tick {i}")
        await asyncio.sleep(1)

await asyncio.gather(bad_sleep(), ticker())

You’d expect ticks while bad_sleep waits. They don’t happen. time.sleep(3) freezes the thread. Since the event loop runs on that thread, everything stops. The same applies to requests.get(), heavy file I/O, and CPU-bound computation. If it doesn’t await, it blocks.

The fix: use asyncio.sleep() instead. It yields control to the loop. For blocking libraries, use run_in_executor():

async def fetch_sync(url):
    loop = asyncio.get_event_loop()
    response = await loop.run_in_executor(
        None,  # default ThreadPoolExecutor
        requests.get, url,
    )
    return response.text

The blocking call runs in a separate thread. The event loop stays free. Use the simulation above to see these patterns in action.

mode:

Step through to inspect state

the event loop

A while True that checks for ready I/O and runs callbacks. asyncio.run() creates one, runs your coroutine, shuts it down. await yields control back to the loop. Between await points, your code runs uninterrupted.

async def say_hello():
    print("hello")
    await asyncio.sleep(1)
    print("world")

asyncio.run(say_hello())

concurrent I/O

asyncio.gather() runs multiple coroutines concurrently. Three 200ms requests finish in roughly 200ms total, not 600ms.

the blocking mistake

time.sleep(), requests.get(), and CPU work all block the event loop. Use async equivalents or run_in_executor() for blocking calls.

+ coroutines are generators

Python’s coroutines evolved from generators over three PEPs and a decade.

PEP 342 (2005) added .send() and .throw() to generators. You could now send values in, not just yield them out.

def old_style_coroutine():
    value = yield "ready"
    print(f"received: {value}")

g = old_style_coroutine()
print(next(g))       # "ready"
g.send("hello")      # "received: hello"

PEP 380 (2012) added yield from for delegating to sub-generators. This enabled composing coroutines without manually forwarding values.

PEP 492 (2015) introduced async def and await as dedicated syntax. await is essentially yield from with type checking. Under the hood, when you await, the coroutine suspends and returns control to the event loop, just like yield returns control to the caller. The loop drives coroutines by calling .send(None) when their I/O is ready.

+ when async is wrong

CPU-bound work. If your function crunches numbers for 500ms without an await, it blocks the loop for 500ms. Use run_in_executor() with a ProcessPoolExecutor:

from concurrent.futures import ProcessPoolExecutor
pool = ProcessPoolExecutor(max_workers=4)

async def good_compute():
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(pool, cpu_heavy_fn)

Blocking libraries. requests, psycopg2, sqlite3. Wrapping them in async def doesn’t make them async. Use aiohttp, asyncpg, aiosqlite instead.

Mixing sync and async. Calling asyncio.run() inside a running loop raises RuntimeError. The sync/async boundary is painful. run_in_executor() helps but adds complexity.

Simple scripts. If you’re making three sequential API calls, synchronous requests is simpler. Use async when you actually need concurrent I/O.

+ structured concurrency

asyncio.gather() has a problem. If one task raises, others keep running in the background. You might not know they exist.

Python 3.11 introduced TaskGroup:

async with asyncio.TaskGroup() as tg:
    tg.create_task(fetch(url_1))
    tg.create_task(fetch(url_2))
    tg.create_task(fetch(url_3))
# all tasks guaranteed done here

If any task raises, TaskGroup cancels all others and waits for them to finish. Exceptions propagate as an ExceptionGroup (PEP 654):

try:
    async with asyncio.TaskGroup() as tg:
        tg.create_task(might_fail())
        tg.create_task(slow_task())
except* ValueError as eg:
    for exc in eg.exceptions:
        print(f"caught: {exc}")
# slow_task was cancelled, not left dangling

The principle: concurrent tasks should have the same lifetime guarantees as function calls. When a task group exits, all its tasks are done.

mode:

Step through to inspect state

The simulation shows coroutines bouncing between running and waiting. But what makes the event loop fast? The answer is in the OS kernel.

I/O multiplexing

When the event loop checks “which I/O operations are ready?”, it asks the kernel. The mechanism has evolved over decades.

select() (1983). Give the kernel a list of file descriptors, it tells you which are ready. Problem: linear scan, capped at 1024 descriptors.

poll() (1986). Removes the 1024 limit, keeps the O(n) scan.

epoll (Linux, 2002) / kqueue (BSD/macOS, 2000). Register interest once, get notified only about ready descriptors. O(1) per ready event. This is what makes tens of thousands of connections practical.

asyncio picks the best mechanism per platform automatically. The loop’s inner cycle looks roughly like this:

while True:
    ready = epoll.poll(timeout=next_timer_deadline)
    for fd, event in ready:
        registered_callbacks[fd]()
    while ready_queue:
        ready_queue.popleft().send(None)  # resume coroutine

Every await on a network operation registers a file descriptor with epoll/kqueue and suspends the coroutine. When data arrives, the kernel marks it ready, the loop resumes the coroutine.

the C10K problem

In 1999, Dan Kegel asked: how do you handle 10,000 simultaneous connections on one server? The standard approach was one thread per connection. 10,000 threads meant MB-scale stack memory each and brutal context switching.

The answer: event-driven I/O. One thread with epoll/kqueue monitors all connections. nginx (2004), Node.js (2009), and Python’s asyncio (2014, PEP 3156) all took this path. Modern servers handle C100K or C1M using the same principles. io_uring on Linux pushes further by batching I/O submissions through shared ring buffers.

different languages, different bets

Node.js (libuv). JavaScript was single-threaded from birth. Node leaned in with libuv wrapping epoll/kqueue/IOCP. Everything is async by default. You can’t accidentally block with a standard library call because blocking APIs barely exist. The cost: callback hell (before async/await), CPU work needs worker_threads, one unhandled exception crashes the process.

Go (goroutines). Go hides the event loop entirely. You write synchronous-looking code and the runtime multiplexes goroutines onto OS threads. No async, no await, no colored functions. Goroutines are cheap (few KB of stack), so you spawn millions. The tradeoff: you don’t control scheduling, and debugging requires understanding the M:N scheduler.

Python (threads + GIL, then asyncio). Python started with OS threads and the GIL. Threads work for I/O (GIL releases during system calls) but they’re heavy and the GIL prevents CPU parallelism. asyncio added an event loop in 3.4, but it’s opt-in. Python now has two concurrency worlds that don’t mix cleanly.

The tension is between explicitness and convenience. Python’s await makes suspension points visible. Go hides suspension, which is convenient but means any function call might yield. Node made everything a callback (then added async/await). Each model trades visibility for convenience in different places.