The Cache | techdebt.blog

mode:

click Step or Play to inspect cache state

the problem

Databases are slow. Not because they are badly written, but because durable storage is physically slow. A network round trip to a database takes milliseconds. A lookup in local memory takes nanoseconds. That is a difference of six orders of magnitude.

A cache sits between the consumer and the source of truth, storing copies of hot data in a faster medium. When the data you need is already cached, you skip the slow path entirely. When it is not, you pay the full cost plus the overhead of checking.

The question is never whether to cache. The question is what to cache, when to invalidate, and what consistency guarantees you need.

cache-aside

The simplest pattern. The application checks the cache first. On a miss, it reads from the database, stores the result in the cache, and returns it. On a write, the application updates the database and either invalidates or updates the cache entry.

read(key):
  value = cache.get(key)
  if value is null:        // miss
    value = db.get(key)
    cache.set(key, value)
  return value             // hit or freshly cached

Cache-aside gives the application full control. It works well when read patterns are predictable and cache misses are tolerable. The downside is that the application must manage both the cache and the database, and there is always a window where the two can diverge.

write-through vs write-back

Write-through writes to the cache and the database on every write. The cache is always consistent with the database, but every write pays the full database latency. Simple and safe.

Write-back (also called write-behind) writes to the cache only, marking the entry as dirty. Dirty entries are flushed to the database later, in batches or on a timer. Writes are fast because they never wait for the database. But if the cache crashes before flushing, those writes are gone.

The tradeoff is clear: write-through trades write latency for durability. Write-back trades durability for write latency. Most systems default to write-through and only use write-back when write throughput is critical and some data loss is acceptable.

eviction

Caches have finite capacity. When the cache is full and a new entry needs space, something must go. Eviction policies decide what to remove.

LRU (Least Recently Used) evicts the entry that has not been accessed for the longest time. The assumption is that recently accessed data is likely to be accessed again. Implemented with a doubly-linked list and a hash map for O(1) lookups and evictions.

LFU (Least Frequently Used) evicts the entry with the fewest accesses. Better for workloads with stable hot sets, but more complex to implement and slower to adapt to changing access patterns.

TTL (Time To Live) expires entries after a fixed duration regardless of access patterns. Simple and predictable. Often combined with LRU or LFU.

LRU wins in practice because it is simple, fast, and adapts well to changing workloads. Redis, Memcached, and most application caches default to LRU or its variants.

cache invalidation

There are only two hard things in computer science: cache invalidation and naming things.

The core problem: when the source of truth changes, the cache must be updated or invalidated. If it is not, reads return stale data. Strategies include:

TTL-based expiry: entries expire after a fixed time. Simple, but stale data persists until expiry.
Event-driven invalidation: the database publishes change events, and the cache subscribes. Accurate, but adds infrastructure complexity.
Write-through invalidation: every write updates the cache. Consistent, but couples writes to cache availability.

No strategy is perfect. TTL is easy but laggy. Event-driven is accurate but complex. Write-through is consistent but slow. Most systems combine two or more approaches.

where it shows up

Redis: In-memory data store used as a cache layer in front of databases. Supports TTL, LRU eviction, and pub/sub for invalidation.
Memcached: Simple, fast, distributed memory cache. No persistence, no data structures beyond key-value. Pure caching.
CDNs (Cloudflare, CloudFront): Cache static and dynamic content at edge locations. TTL-based with cache-control headers. Reduces latency by serving content from the nearest edge.
CPU caches (L1/L2/L3): Hardware caches that exploit spatial and temporal locality. Write-back by default (dirty cache lines flushed on eviction). The original cache.
Browser cache: HTTP caching with Cache-Control, ETag, and Last-Modified headers. Reduces network requests for static assets.

mode:

click Step or Play to inspect cache state

+ cache stampede and thundering herd

When a popular cache entry expires, many concurrent requests discover the miss simultaneously. All of them hit the database at once, potentially overwhelming it. This is a cache stampede (also called thundering herd).

Mitigation strategies:

Locking: The first request to discover the miss acquires a lock. Other requests wait for the lock holder to populate the cache. Simple, but adds latency for waiting requests.

Probabilistic early recomputation: Each request has a small probability of refreshing the cache before expiry. The busier the key, the more likely someone refreshes it early. No locks, no thundering herd.

Stale-while-revalidate: Return the stale value immediately while refreshing in the background. The user gets fast (slightly stale) data, and the cache updates asynchronously.

Facebook’s Memcached paper describes “lease” tokens that prevent stampedes. When a miss occurs, the cache issues a lease token. Only the token holder can populate the entry. Others wait or get a stale value.

+ distributed caching and consistent hashing

A single cache server runs out of memory. Distributed caching spreads data across multiple servers. The question: which server holds which key?

Consistent hashing maps keys to a ring of servers. Adding or removing a server only remaps a fraction of keys (roughly 1/N). Without consistent hashing, a server change remaps nearly every key, causing a cache miss storm.

Redis Cluster uses hash slots (16384 slots distributed across nodes). Memcached clients use consistent hashing with virtual nodes. Both approaches solve the same problem: stable key distribution across a changing set of servers.

The connection to the consistent hashing post in this series is direct. The same ring structure that distributes data across shards distributes cache entries across cache nodes.

+ cache-aside vs read-through vs write-through

These terms describe who manages the cache:

Cache-aside: The application manages reads and writes to both cache and database. Most flexible, most code.

Read-through: The cache itself fetches from the database on a miss. The application only talks to the cache. Less code, but the cache needs a database loader function.

Write-through: The cache writes through to the database on every write. The application writes to the cache, and the cache handles persistence.

Write-behind (write-back): The cache writes to the database asynchronously. Fastest writes, weakest durability.

In practice, cache-aside is the most common because it gives full control. Read-through and write-through are useful when the cache library supports them natively (like Guava’s LoadingCache or Caffeine in Java).

+ TTL strategies and stale-while-revalidate

Choosing the right TTL is a balancing act. Too short: high miss rates, frequent database hits. Too long: stale data persists.

Fixed TTL: Every entry expires after the same duration. Simple but inflexible. A product listing and a stock price have very different freshness requirements.

Adaptive TTL: Entries that change frequently get shorter TTLs. Entries that rarely change get longer ones. More accurate, but requires tracking change frequency.

Stale-while-revalidate: Serve the stale entry immediately while triggering an asynchronous refresh. The next request gets fresh data. This is used in HTTP caching (the stale-while-revalidate Cache-Control directive) and in application caches.

Jittered TTL: Add a small random offset to TTLs to prevent many entries from expiring simultaneously (which would cause a stampede). Instead of TTL=300s for everyone, use TTL=300 + random(0,30).

The best practice is to combine TTL with event-driven invalidation. TTL acts as a safety net (data is never staler than N seconds), while events provide real-time updates for critical changes.

mode:

click Step or Play to inspect cache state

production stories

Redis at scale

Redis is the default caching layer for most web applications. Its single-threaded event loop handles hundreds of thousands of operations per second. Data structures beyond simple key-value (sorted sets, streams, bitmaps) make it useful for more than just caching.

At large scale, Redis Cluster distributes data across multiple nodes using 16384 hash slots. Each node owns a subset of slots. Client libraries know the slot mapping and route requests directly. When a node fails, its replica promotes and takes over the slots.

The operational challenge is memory management. Redis stores everything in RAM. When memory is full, the eviction policy kicks in. The allkeys-lru policy is the safe default for caches. The volatile-lru policy only evicts keys with TTLs, which protects important keys but can cause out-of-memory errors if too many keys lack TTLs.

CDN caching

CDNs cache content at hundreds of edge locations worldwide. When a user in Tokyo requests a page, the edge server in Tokyo serves it from cache instead of fetching it from the origin server in Virginia. Latency drops from hundreds of milliseconds to single digits.

Cache-Control headers govern CDN behavior. max-age=3600 caches for one hour. s-maxage=86400 tells the CDN to cache for 24 hours even if the browser cache is shorter. stale-while-revalidate=60 serves stale content for 60 seconds while refreshing in the background.

Cache invalidation at CDN scale is expensive. Purging a single URL from hundreds of edge servers takes seconds to minutes. This is why immutable asset URLs (with content hashes in the filename) are standard practice: the URL changes when the content changes, so you never need to invalidate.

when not to cache

Caching adds complexity. Before adding a cache, consider whether the problem can be solved by making the database faster:

Add an index: A missing database index is often the real problem. Adding the right index can turn a 500ms query into a 5ms query, eliminating the need for caching entirely.
Denormalize: If your query joins five tables, consider storing the pre-joined result in the database. Trades storage for read speed.
Read replicas: Route read traffic to database replicas. No cache invalidation to worry about, and you get strong consistency for free (modulo replication lag).

Cache when: reads vastly outnumber writes, data is expensive to compute, and some staleness is acceptable. Do not cache when: data changes frequently, consistency is critical, or the working set does not fit in memory.