Cache Invalidation — System Guide

Section 1

TL;DR — Cache Invalidation in Plain English

Why cache invalidation is the hard half of caching — and exactly what goes wrong when you skip it or get it wrong (wrong prices, phantom inventory, double-charged payments)
The 4 canonical invalidation strategies — TTL, explicit purge, write-through, and CDC/event-driven — what each does, its consistency guarantee, and its killer weakness
Why TTL-only is dangerous for anything with financial or inventory consequences, and how to reason about the right maximum staleness for any business requirement
How Change Data Capture (CDC) works as an event-driven invalidation pipeline and why it solves the dual-write problem that kills write-through in distributed systems
The production patterns (versioned keys, surrogate cache tags, generational caching) that real engineering teams use to keep invalidation sane at scale

Phil Karlton, a principal engineer at Netscape in the mid-1990s, once quipped: "There are only two hard things in computer science: cache invalidation and naming things." The joke has survived three decades because it's accurate. Reading data from a cache is trivial. Knowing when the cached copy no longer reflects reality — and discarding it before a user acts on the wrong data — is where caching systems fall apart in production.

A cache stores a copy of data from a source of truth (usually a database). The moment that source-of-truth data changes, the cached copy becomes stale — it's a snapshot that no longer matches reality. If a user reads the stale copy and acts on it (buys at an old price, orders out-of-stock inventory, transfers money twice) you have a real production incident. Cache invalidation is the set of strategies that decide: "when should we throw away the cached copy so the next read fetches the fresh version?" Every strategy is a trade-off between consistency (how fresh the data is), performance (how often we hit the database), and operational complexity (how hard it is to build and maintain).

TTL (Time-To-Live) puts a timer on every cached entry; when the timer expires the entry is automatically deleted and the next read fetches fresh. Simple, low-ops, but stale for up to TTL seconds after every update. Explicit Purge means the application or an operator directly deletes a cached entry the moment the source data changes — immediate consistency, but requires the writer to know which cache keys to invalidate (a hard problem at scale). Write-Through updates both the cache and the database in the same write operation — no staleness at all, but it doubles write latency and creates the dual-write consistency trap in distributed systems. CDC (Change Data Capture) listens to the database's change stream (the binary log or WAL) and fires invalidation events whenever a row changes — decouples the cache from the writer, near-real-time, avoids dual-write, but requires a streaming pipeline (Debezium, Kafka Connect) and introduces operational complexity. Each strategy has a home; the art is picking the right one for each data type in your system.

The difficulty comes from three compounding factors. First, a single logical piece of data can live in many cache entries at once — the price of product #1234 might be in a product-detail cache entry, a search-results cache entry, a "you might also like" entry, and a cart subtotal entry. Changing the price in the database means invalidating all four, and you have to know all four exist. Second, writer and cache are often different services — the service that writes the price and the service that reads from the cache may not share code or even an RPC boundary. Third, invalidation in a distributed system (multi-region, multi-node) must propagate to every cache replica, and network partitions mean some replicas might not receive the invalidation. This page goes deep on the strategies, the failure modes, and the patterns that production teams actually use to tame all three dimensions of the problem.

Cache invalidation is the discipline of deciding when a cached copy of data is no longer trustworthy and must be discarded. The four canonical strategies — TTL, purge, write-through, and CDC — each make a different trade-off between consistency, performance, and complexity. Picking the right strategy depends on how much staleness the business can tolerate, who performs writes, and how the cache relates to its source of truth.

Section 2

Why You Need This — When Stale Data Becomes a Bug

Most engineers learn caching from the happy path: add a cache, hit ratio goes from 0% to 95%, database stops sweating, latency drops from 80 ms to 2 ms. That story is real and it's great. The part that doesn't make it into blog posts is what happens six months later, when a price change lands in the database but the cached version of the product page is still serving the old price — and 40,000 customers are actively browsing.

The Production Story: A Price Update and $85,000 in Refunds

Here's a scenario that plays out at e-commerce companies regularly. A product manager updates the price of a laptop from $1,499 to $1,299 — a flash sale. The database update lands in under a millisecond. But the product detail page is cached in Redis with a 24-hour TTL set six hours ago. That cached page still shows $1,499. Meanwhile, a promotional email goes out advertising the new $1,299 price. Customers click the email link, which loads the cached product page at $1,499. Some close the tab. Some, confused, try a different URL — which also hits the cache. A fraction just trust the email and call support. And some go to checkout — which hits a different cache layer that did receive the update — and pay $1,299, but their order confirmation email (generated from the cached product object) says $1,499. Now the support queue is full, legal is nervous, and someone is manually issuing refunds.

That's the staleness bug in its friendliest form — a price discrepancy. The same failure pattern produces consequences that range from embarrassing to catastrophic:

Phantom inventory — an item is sold out in the database, but the cached product page still shows "In Stock." Customers add it to their cart, check out, and you have to cancel orders.
Double payments — a payment state is cached as "pending." A retry API call reads the cached state and submits a second payment. The database has the correct idempotency key but the cached view doesn't.
Discontinued items — a product is removed from the catalog. The product page is gone but a cached "you might also like" carousel still links to it. Customers click a dead link on your own site.
Authorization bypass — a user's permission token is cached. The user is demoted or banned. The cached token still grants access for TTL seconds. In security-sensitive systems, this is an incident.

The Math: At 99% Hit Ratio, 99% of Your Reads Are Stale

Here's the number that hits engineers hard when they first see it. Suppose your product catalog has 100,000 items. You've set a 5-minute TTL on every product cache entry. At any given moment, each entry is between 0 and 5 minutes stale. Your cache hit ratio is 99% — meaning 99 out of every 100 product-page reads serve the cached copy. If you update one product's price, 99% of reads for that product see the old price for up to 5 minutes. At 10,000 product reads per second across your site, a single product update means roughly 9,900 reads per second are serving the wrong price until the TTL expires. With a 5-minute TTL that's up to 2.97 million stale reads before the cache clears. Whether that matters depends entirely on what the data is — for a blog post, 5 minutes of staleness is fine. For a financial instrument price, it can be illegal.

The diagram above makes the gap visible. The database reflects the correct price from the moment the write lands (T=0). The cache continues serving the old price for the entire staleness window — up to the full TTL duration. Every read that hits the cache during that window sees wrong data. The width of that window is the TTL. The fraction of reads that see wrong data is the cache hit ratio. Both numbers together determine how much business risk your TTL choice carries.

The Core Question Invalidation Answers

Every invalidation strategy is an answer to one question: "What is the maximum amount of time that can pass between a source-of-truth update and the moment the cache starts serving the new value — and what mechanism enforces that bound?" For a product description, the answer might be "24 hours, enforced by TTL." For a product price, it might be "5 seconds, enforced by explicit purge on update." For a bank balance, it might be "0 seconds, enforced by bypassing the cache entirely on reads or write-through on writes." The business requirement drives the acceptable staleness window; the staleness window drives the strategy choice.

The key insight: Caching and consistency are in direct tension. A perfect cache (100% hit ratio, zero database load) means you never read fresh data. A perfectly consistent system (always reads from DB) means you get no caching benefit. Cache invalidation is the engineering discipline that navigates this tension — picking the right point on the consistency-performance curve for each type of data in your system.

Stale cache data causes real production bugs: wrong prices, phantom inventory, double payments, and authorization bypasses. The size of the staleness window (TTL duration) combined with the cache hit ratio determines how many reads serve wrong data after any given update. Cache invalidation is the discipline that answers "what is the maximum acceptable staleness window, and what mechanism enforces it?" — and the answer differs for every data type in the system.

Section 3

Mental Model — The Source-of-Truth Pyramid

Before diving into specific invalidation strategies, it helps to have a single mental model that explains why the problem exists at all. Here it is: data in a modern web system lives in layers, arranged like a pyramid. The database at the bottom is the source of truth. Caches sit above it — faster, closer to the user, but derived. CDN edges sit at the top — fastest, most numerous, but furthest from the source and therefore the most likely to be stale.

The Pyramid: Distance from Source = Lag + Harder Invalidation

Think of the pyramid this way: the further a layer is from the database, the faster reads are from that layer (because it's closer to the user), and the harder invalidation becomes (because there are more copies to invalidate, farther away, with less reliable delivery). A database update takes one write. Invalidating a Redis cluster takes one delete per key across the cluster. Invalidating a CDN edge cache may require an API call to dozens of edge nodes in different regions, some of which may be temporarily unreachable. The further up the pyramid you go, the larger the staleness window tends to be — and the more that staleness costs when it's wrong.

The pyramid diagram shows the fundamental structure of the problem. At the bottom is the database — slow to read but always correct by definition. At every layer above it, you trade some correctness (you might read stale data) for speed (the read is faster). As you move up the pyramid, the lag between "source updated" and "this layer serves the new value" grows. And critically, the number of copies that need to be invalidated grows too — one database row becomes one Redis key becomes a key cached in 64 CDN edge nodes in 20 countries becomes a value cached in browser local storage on millions of client devices. The further up the pyramid the data is, the harder it is to invalidate across all copies quickly.

The Contract: Maximum Acceptable Lag

The way to use this mental model practically: for every type of data in your system, define a maximum acceptable lag — the longest time that can pass between a database update and the moment every reader starts seeing the new value. This is a business decision, not a technical one. "Prices must be current within 10 seconds" is a business decision. "Blog post content can be stale for up to an hour" is a business decision. Once you have the number, you choose the invalidation strategy that can enforce it at each pyramid layer. TTL enforces lag = TTL duration. Explicit purge enforces lag ≈ 0 at the app-cache layer but not at the CDN layer unless you also issue a CDN purge. CDC enforces near-real-time lag at the app-cache layer. There is no single strategy that works at all pyramid layers simultaneously — that's why real systems mix strategies: short TTLs at the CDN edge combined with event-driven invalidation at the Redis layer, for example.

Practical exercise: Take the three most critical data types in any system you work on — a price, a user permission, a count of something. For each one, answer: (a) what is the maximum acceptable lag in seconds? (b) which pyramid layer is it cached at? (c) which strategy currently enforces that lag? If you don't know the answer to (c), you probably have an implicit TTL somewhere that nobody has tuned.

Data in a web system lives in a pyramid of layers — database (source of truth), application cache, CDN edges, client-side — where each layer trades some correctness for speed. The further from the source of truth, the greater the potential staleness and the harder the invalidation. The practical tool for reasoning about this is defining a "maximum acceptable lag" for each data type, then choosing invalidation strategies that can enforce that lag at every pyramid layer the data inhabits.

Section 4

Core Concepts — The Vocabulary of Invalidation

Before we can talk about why write-through fails in a distributed system or why CDC solves the dual-write problem, we need a shared vocabulary. Twelve terms appear constantly in any serious discussion of cache invalidation. For each one, the plain English meaning comes first — then the precise technical term you'll see in papers, documentation, and production code reviews.

The Twelve Terms You Must Know

Stale data — When the cached copy of a value no longer matches what's in the database. Think of it like a printout of a live Google Sheet: the moment someone edits the Sheet, your printout is stale. In caching, the technical term is staleness. The word "stale" has a specific meaning here: it's not corrupted data. It was correct when written. It's just an old snapshot.

Time-To-Live (TTL) — Every cached entry can be given a lifetime in seconds. When that countdown hits zero, the entry is automatically deleted from the cache. The next read for that key misses, fetches from the database, and repopulates the cache with a fresh copy. TTL is the simplest invalidation mechanism — you don't have to detect changes or coordinate between services. The downside is that within the TTL window, reads can be stale, and you don't control exactly when the staleness window ends.

Staleness window — The period between a database update and the moment the cache stops serving the old value. With TTL-based invalidation, the worst-case staleness window equals the TTL. With explicit purge, the staleness window can be close to zero (you purge the moment you write). Understanding the staleness window for each data type is the first step in designing a correct invalidation strategy.

Consistency — In cache invalidation, "consistent" means every reader sees the same value and that value matches the source of truth. There are two levels: strong consistency (every read always sees the latest value — requires bypassing or immediately invalidating the cache on every write) and eventual consistency (all readers will see the latest value eventually, but there's a window when they might still see old data — that window is the staleness window). Most caching systems are eventually consistent by design.

Explicit purge — A direct command to delete a cache entry, issued by the application at the moment the source data changes. "The price changed — delete cache:product:1234." Purge can enforce a near-zero staleness window, but it requires the writer to know every cache key that holds a derived view of the data it's changing — which is the hard part.

Write-through — A pattern where every write updates both the cache and the database in the same operation. If the database write succeeds, the cache is also updated (or the old cache entry is deleted) before the write returns. This means the cache is always fresh — but it adds latency to every write, because you can't return until both the database write and the cache operation have succeeded.

Write-around — The opposite approach: writes go directly to the database, bypassing the cache entirely. The cache is only populated on reads (cache-aside pattern). This keeps the cache from ever holding stale data from writes — but it means a write followed immediately by a read will always miss the cache, because the write didn't populate it. Useful when write-heavy data is also read-heavy immediately after write (like a user's own profile after they edit it).

Write-back (write-behind) — Writes land in the cache first and are written to the database asynchronously. Very fast writes, but the cache now holds data the database hasn't persisted yet — a window where a crash loses data. Write-back trades durability for write speed and is used in situations where you can tolerate a small window of data loss (like game leaderboards or analytics counters).

Dual-write problem — When a system tries to update two separate stores (cache + database) as two separate operations, without a distributed transaction. If the database write succeeds but the cache update fails (or vice versa), the two stores are now inconsistent. The dual-write problem is the reason write-through is fragile in distributed systems — and the reason CDC (Change Data Capture) is the preferred approach for systems that need near-zero staleness without the dual-write risk.

Change Data Capture (CDC) — A technique for observing and reacting to every change in a database by reading the database's own internal change log (the binary log in MySQL, the WAL in PostgreSQL). Instead of the application explicitly updating the cache on write, a CDC agent (like Debezium) reads every database mutation as an event and can trigger cache invalidation automatically. Because the events come from the database itself, there's no dual-write: the database is the single writer, and cache invalidation is a derived reaction. CDC adds operational complexity (a streaming pipeline, typically Kafka + Debezium) but is the gold standard for high-consistency, low-staleness cache invalidation without coupling writers and cache layers.

Surrogate key / cache tag — Instead of invalidating one cache entry at a time, a surrogate key (or cache tag) is a label attached to a group of cache entries. Invalidating the surrogate key invalidates all entries that carry that label. For example, all product-page cache entries for category "laptops" might carry the tag category:laptops. When a laptop's price changes, you invalidate the tag — which clears all related cache entries without needing to enumerate them individually. Surrogate keys solve the enumeration problem — the "I know the price changed but I don't know every cache key that displays this price" problem that makes explicit purge hard to scale.

Generational (versioned) key — Instead of deleting a cache entry, you increment a version number embedded in the cache key. All new writes use the new key; all old cache entries (which have the old version in their key) are now "orphaned" — they still live in the cache but will never be read again because no code generates the old key anymore. Generational keys avoid the need for explicit deletes — useful in CDN contexts where purge APIs are rate-limited or expensive. The trade-off is that stale entries waste cache space until their TTL expires.

The vocabulary map above organizes the 12 concepts into three groups: the problems that make invalidation hard (stale data, staleness windows, the dual-write trap), the strategies that solve those problems (TTL, purge, write-through, CDC, surrogate keys), and the consistency models that describe the guarantees each strategy provides. When you read about cache invalidation in production postmortems or design docs, these are the terms you'll see — and now you have the plain-English grounding for each.

The twelve invalidation terms cluster into three families: the problems (staleness, the staleness window, the dual-write trap) that motivate everything, the strategies (TTL, purge, write-through, write-around, write-back, CDC, surrogate keys, generational keys) that attack those problems, and the consistency models (strong, eventual) that describe what guarantees each strategy gives you. Picking a strategy without naming the consistency level it implies is how teams end up with mystery production incidents — every strategy choice silently commits you to a specific freshness contract.

Section 5

The 4 Canonical Invalidation Strategies — Overview

There are exactly four ways to keep a cache in sync with its source of truth, and every invalidation system you encounter in production is either one of these four strategies or a hybrid of two. This section gives you the overview — the big picture of each strategy before the deep dives in Sections 6–9. Read this section first so you have a map; the later sections fill in the territory.

Strategy at a Glance

The four strategies are ordered from least consistent to most consistent — and from least operationally complex to most. They're also ordered from "every team accidentally uses this" to "teams build this intentionally when they've been burned by the others."

The four-panel diagram above shows each strategy's core mechanism at a glance, and the comparison table below maps each to the dimensions that matter for choosing between them. Notice the trade-off pattern: strategies with lower staleness require more write-side coordination (write-through) or more operational infrastructure (CDC). TTL requires nothing from the writer but accepts staleness up to the full TTL duration. This is why real systems mix strategies — you don't pick one strategy for the whole system, you pick one per data type based on how much staleness that data type can tolerate and who owns the writes.

How to Read the Sections Ahead

Sections 6 through 9 go deep on each strategy in turn: the mechanics, the math, the failure modes, and the patterns that production teams use to make each one robust. Section 6 covers TTL — by far the most commonly used strategy, and one with more failure modes than most engineers realize. Section 7 covers explicit purge — deceptively simple until you need to invalidate composite cache keys. Section 8 covers write-through and its sibling write-back — why they feel right on paper and why they cause pain in distributed systems. Section 9 covers CDC — the most powerful strategy and the one that requires the most infrastructure. After Section 9, you'll have enough detail to evaluate any invalidation system you encounter in the wild and know exactly which strategy it uses, why, and what failure modes to watch for.

Four canonical strategies cover the entire space of cache invalidation: TTL (timer-based auto-expiry), explicit purge (writer-triggered delete), write-through (atomic cache + DB update), and CDC (database-change-stream-driven invalidation). Each makes a different consistency-vs-complexity trade-off. No single strategy is right for all data types — real systems pick a strategy per data type based on its maximum acceptable staleness and who performs writes.

Section 6

TTL Deep Dive — Eventual Consistency by Wall-Clock

TTL is the first caching strategy every engineer learns and the one almost every system ships with by default. It's a beautiful idea in its simplicity: every cached entry gets a countdown timer. When the timer hits zero, the cache automatically deletes the entry. The next request for that key misses the cache, fetches from the database, and repopulates the cache with a fresh copy and a new TTL. The writer never has to touch the cache. The cache operator never has to think about which keys to invalidate. The system just… works — up to the point that it doesn't.

The Mechanics: How TTL Works in Redis

In Redis, setting a TTL is a two-step operation you can make atomic. When you write a key, you also set an expiry. Redis stores the expiry as an absolute Unix timestamp in milliseconds and tracks it separately from the value itself. It deletes expired keys in two ways: lazily (when you next try to read the key, Redis checks if it's expired and deletes it before returning a miss) and actively (a background task scans a sample of keys with TTLs every 100 ms and deletes any that are expired). This means an expired key might live in memory for up to 100 ms past its expiry time before the background scan catches it — usually not significant, but worth knowing if you have very tight TTL requirements.

# Option 1: SET with EX (expiry in seconds) — atomic, one command SET product:1234 '{"price":1299}' EX 60 # This key will auto-delete 60 seconds from now. # Option 2: SET then EXPIRE — two commands, not atomic SET product:1234 '{"price":1299}' EXPIRE product:1234 60 # Option 3: PEXPIRE for millisecond precision SET product:1234 '{"price":1299}' PX 60000 # Check how long a key has to live TTL product:1234 # returns seconds remaining, -1 if no TTL, -2 if missing # Check exact expiry timestamp EXPIRETIME product:1234 # returns Unix timestamp in seconds

The first command is the preferred form — SET key value EX seconds sets the value and the TTL atomically in one operation. Why does atomicity matter here? Because if you SET then crash before EXPIRE, you've written a key with no TTL — it will live forever. In production, always use the atomic form.

For HTTP responses served through a CDN or browser cache, TTL is expressed in the Cache-Control header:

# Cache this response for 60 seconds in any intermediate cache Cache-Control: public, max-age=60 # Cache for 60 seconds in the browser only (not in CDN) Cache-Control: private, max-age=60 # Cache for 60 seconds, but revalidate after expiry (send If-None-Match) Cache-Control: public, max-age=60, must-revalidate # No caching at all Cache-Control: no-store

max-age=60 is HTTP's equivalent of Redis's EX 60 — the browser or CDN will serve the cached copy for up to 60 seconds from when it was first cached. After 60 seconds, it either fetches a fresh copy unconditionally or sends a conditional GET with an If-None-Match header (containing the ETag of the cached copy) — the server can return a 304 Not Modified if the data hasn't changed, saving bandwidth even on a cache miss.

Why TTL Works: The Bounded Staleness Contract

TTL is not "wrong" — it's a deliberate trade-off. You're saying: "I accept up to N seconds of staleness in exchange for zero write-side coordination complexity." For a huge fraction of data, this trade-off is correct. Think about:

Blog post content — updates maybe once a day. A 1-hour TTL means at worst readers see content that's 1 hour old. Acceptable for an editorial site.
User profile data — changes infrequently (display name, bio, avatar). A 5-minute TTL means profile pages are at most 5 minutes stale. Fine for social media.
Feature flags / configuration — typically change rarely. A 30-second TTL means the worst case is that a feature flag change takes 30 seconds to propagate. Acceptable for most deploys.
Search index metadata — new items appear in search results within TTL seconds of being indexed. A 60-second TTL is typical for e-commerce search.

The math for choosing a TTL is: TTL = maximum acceptable staleness in seconds. If the business says "product descriptions can be up to 30 minutes stale," set TTL = 1800. If it says "prices must be current within 5 seconds," you need a better strategy than TTL alone (or a 5-second TTL, which dramatically increases database load). The business requirement drives the number; the number drives the strategy choice.

The Hidden Killer #1 — The TTL Stampede

Here's the failure mode that bites every team at some point. Imagine you have 10,000 product pages, all cached with a 60-second TTL. You deploy your application at 14:00:00. All 10,000 cache entries are written in the first few seconds of the deploy. They all expire at approximately 14:01:00 — 60 seconds later. At 14:01:00, all 10,000 cache entries expire simultaneously. 10,000 concurrent requests for product pages all miss the cache at the same time. All 10,000 hit the database simultaneously. The database receives 10,000 concurrent queries when it was handling 500. It falls over. This is the thundering herd, also called a TTL stampede. It's especially vicious on cold deployments, after outages (when the cache is empty), and on applications that batch-populate the cache at startup.

The diagram shows the difference visually. Without jitter, all 10,000 expiries land at exactly T=60 seconds — one brutal spike of database queries. With jitter, the expiries are spread across a 20-second window (T=50 to T=70), so the load is distributed smoothly across time. The total number of cache misses is the same; the difference is whether they arrive all at once (catastrophic) or spread out (harmless).

The Jitter Fix

The solution to TTL stampedes is to add random jitter to every TTL. Instead of setting all entries to exactly 60 seconds, you set each entry to a random duration in a range around your target TTL. A common approach: TTL = base_ttl + random(0, base_ttl * 0.2). So for a 60-second TTL, each entry gets a lifetime between 60 and 72 seconds. The entries expire at different times, and the database load is smoothed out across the jitter window. Why 20% and not, say, 200%? Because you want enough spread to break synchronization but not so much that some entries effectively live twice as long as your staleness SLA allows — 10–20% is the sweet spot for most workloads.

import random import redis r = redis.Redis() def cache_set_jittered(key: str, value: str, base_ttl: int, jitter_fraction: float = 0.2) -> None: """ Set a cache entry with jittered TTL to avoid stampedes. base_ttl: target lifetime in seconds jitter_fraction: fraction of base_ttl to use as jitter range Example: base_ttl=60, jitter_fraction=0.2 → TTL is uniformly random in [60, 72] seconds """ jitter = int(base_ttl * jitter_fraction) ttl = base_ttl + random.randint(0, jitter) r.set(key, value, ex=ttl)

Line-by-line: jitter = int(base_ttl * jitter_fraction) computes the maximum extra seconds we'll add — for a 60-second TTL with 20% jitter, this is 12 seconds. random.randint(0, jitter) picks a random offset between 0 and 12 seconds. ttl = base_ttl + random.randint(0, jitter) gives each entry a unique lifetime between 60 and 72 seconds. Every caller to cache_set_jittered gets a slightly different TTL, so expiries are naturally spread across time.

The Hidden Killer #2 — Tail Latency on Cache Miss

Even with jitter, TTL has a second failure mode that's subtler: when a popular cache entry expires and the next request has to refetch from the database and repopulate the cache, every subsequent request for that key arrives during the repopulation window. If repopulation takes 50 ms (a database roundtrip), and 200 requests per second target this key, all 200 req/s during that 50 ms window will miss the cache and hit the database. Only the first request repopulates the cache; the other 199 got nothing to wait for and all fired their own database queries.

This is sometimes called a cache stampede or dog-pile effect. The fix is a pattern called probabilistic early expiration (also called cache warming or stale-while-revalidate): instead of waiting for the TTL to hit zero before refetching, a small fraction of requests "probe" the cache status early and trigger a background refresh before the TTL expires. The cache stays warm, and no request ever actually sees a miss.

Stale-while-revalidate in HTTP: The Cache-Control: stale-while-revalidate=30 directive tells CDNs and browsers to serve the stale cached copy for up to 30 seconds past its max-age expiry while asynchronously fetching a fresh copy in the background. The user gets an instant response (stale but fast); the fresh copy arrives and replaces it for the next request. This is the HTTP-level version of probabilistic early expiration and is supported by most modern CDNs and browsers.

The Hidden Killer #3 — Fixed-Period Scheduling Clumping

A more subtle stampede variant: your application runs a background job every 60 seconds that batch-refreshes a set of cache entries. The job writes all entries with TTL=60. The entries all expire at the same time the job next runs. If the job itself is slow or fails, all entries expire before the refresh completes, and real user traffic catches a mass miss. The fix is the same: add jitter to the TTL, and decouple the refresh job's schedule from the TTL duration so they don't align.

The lifecycle diagram shows the three phases of every TTL-managed cache entry: live (cache serves reads), expired and fetching (the miss window where the database gets hit), and repopulated (cache is live again with a fresh TTL). The miss window is typically just one database roundtrip — 5–50 ms. But during that window, every concurrent read for the same key also misses and fires its own database query. For high-traffic keys, this brief window can translate to hundreds of simultaneous database queries. Jitter prevents multiple entries from entering the miss window at the same time; probabilistic early expiry prevents popular entries from ever fully entering the miss window.

When TTL-Only Is Sufficient (and When It's Not)

TTL-only invalidation is sufficient when: (a) the data changes infrequently relative to the TTL, (b) the business can tolerate staleness up to the TTL duration, and (c) the data is not financially or legally sensitive. It is not sufficient when: (d) data changes are frequent and unpredictable (e.g., a live inventory count that can hit zero at any time), (e) the consequences of stale data are customer-facing financial errors (e.g., prices, discount codes, payment state), or (f) the data is a security credential (e.g., session tokens, API keys, permission sets) where the window between revocation and cache expiry represents an active security window. For these cases, explicit purge, write-through, or CDC must be layered on top of or in place of TTL.

TTL is not a substitute for an invalidation strategy. Setting a TTL on every cache entry is a good practice even when you use explicit purge or write-through — as a safety net, so stale entries eventually clear even if the purge event was missed. But a TTL alone is not an invalidation strategy for anything that requires bounded staleness tighter than the TTL duration.

TTL is the simplest invalidation mechanism: every entry gets a timer, and the cache deletes the entry when the timer expires. It requires zero coordination between writer and cache, making it the default for most systems. Its two main failure modes are the thundering herd (many entries expiring simultaneously, flooding the database) and the cache stampede (concurrent reads on a popular key all missing simultaneously). Both are solved by adding random jitter to TTL values. TTL-only is appropriate for read-heavy, low-stakes data; for financial or security-sensitive data it must be complemented by explicit purge or CDC.

Section 7

Explicit Purge — "Delete It When the Data Changes"

The simplest idea in cache invalidation: when you change data in the database, immediately delete the corresponding cache entry. No timer. No lag. The next read that comes in finds nothing in the cache, goes to the database, fetches the fresh value, and repopulates the cache. You are in direct control — you decide exactly when the cached copy is discarded.

The appeal is obvious. You're not waiting for a TTL to tick down. The moment the product price changes, the cache is cleared, and the very next read gets the new price. Compare this to TTL: with a 5-minute TTL you might serve the old price to millions of reads before the entry expires. With explicit purge, the stale window collapses to milliseconds — just the time between the database write and the cache delete completing.

The Basic Pattern: Write DB → Delete Cache

The classic implementation looks simple in code. When your application updates a row, it fires two operations: the database write, then the cache delete.

# On any write to product data: def update_product_price(product_id: int, new_price: float): # Step 1: update the source of truth db.execute( "UPDATE products SET price = %s WHERE id = %s", (new_price, product_id) ) # Step 2: delete the cached copy so the next read fetches fresh cache.delete(f"product:{product_id}") cache.delete(f"product:{product_id}:detail") cache.delete(f"product:{product_id}:card") # ⚠ You have to know EVERY cache key that touches this product

The walkthrough: line 5 writes the new price to the database. Lines 8–11 delete every cache key that might contain that product's price. The next read for product:42 misses the cache, hits the database, and gets the current price. Clean, immediate, correct — as long as you remembered every key.

That last caveat is where the strategy begins to crack. In a real system, a single product's price might appear in a product-detail cache entry, a category-listing entry, a search-results entry, a recommendations carousel entry, and a cart subtotal entry. All of them need to be purged atomically when the price changes. And the set of keys grows every time a new feature is built — silently, without updating the purge logic.

The Dual-Write Race: Why "Simple" Purge Fails Under Concurrency

Even if you know every cache key, there is a subtler bug lurking in the write-then-delete ordering. Picture two concurrent requests: Writer A is updating the price. Reader B is doing a cache miss, reading from the database, and about to repopulate the cache. If these requests interleave in the wrong order, you end up with a permanently stale cache entry — one that no TTL will ever clear unless you set one as a fallback.

The diagram traces the race step by step. Reader B gets a cache miss and queries the database — but does so a hair before Writer A's transaction commits. So Reader B reads the old price. Then Writer A commits its write and fires its DELETE, leaving the cache empty. Then Reader B, blissfully unaware, writes the old price back into the cache. Now the cache holds the stale value indefinitely — the DELETE already fired, there's nothing left to trigger another one. If there's no TTL as a fallback, this stale entry lives forever. At high request rates this race is not theoretical; it happens regularly.

Fix 1: Delete Before Write (Cache-Aside with Pre-Delete)

One approach: reverse the order. Delete the cache entry before writing to the database. Now any Reader B that comes in during the write finds a cache miss, queries the database, and — because the write hasn't committed yet — reads the old value. But that's OK: the write will complete soon, and on the next request the old cached value's TTL (if any) or another purge will clean it up. You've reduced the race window significantly, though not eliminated it entirely.

Pre-delete is still not race-free. If the database write fails after you've already deleted the cache, you now have an empty cache entry that will be repopulated from the (unchanged) database. That's fine for correctness — the reader just gets the old, correct value. But if you delete first and then a read comes in before the write completes, you pay an unnecessary cache miss. The real remaining risk is the same concurrent-reader problem at read-heavy systems.

Fix 2: Retry Queues for Failed Deletes

Sometimes the cache delete itself fails — the Redis node is temporarily unreachable, a network blip drops the command, or the application crashes between the DB commit and the cache delete. The result: the database has the new value but the cache still holds the old one. The fix is a retry queue. After every successful database write, publish an invalidation event to a durable queue. A separate worker consumes the queue and fires the cache delete. If the delete fails, the event stays in the queue and is retried with exponential backoff.

# Pattern: publish invalidation event to a durable queue after every DB write def update_product_price(product_id: int, new_price: float): db.execute( "UPDATE products SET price = %s WHERE id = %s", (new_price, product_id) ) # Even if cache.delete() fails here, the queue guarantees eventual delivery invalidation_queue.publish({ "type": "invalidate", "keys": [ f"product:{product_id}", f"product:{product_id}:detail", f"product:{product_id}:card", ] }) # Separate worker: def invalidation_worker(): for event in invalidation_queue.consume(): for key in event["keys"]: try: cache.delete(key) invalidation_queue.ack(event) except CacheUnavailable: invalidation_queue.nack(event) # retry later

The key insight: by moving the delete to an async worker backed by a durable queue, you decouple the write path from the cache's availability. A Redis outage no longer breaks the write path — it just causes a brief delay in invalidation. The cache will be corrected as soon as the queue worker succeeds.

Fix 3: Transactional Outbox — Atomicity Without Distributed Transactions

The retry queue approach still has a gap: if the application crashes between the database commit and the queue publish, the invalidation event is never sent. The fix is to make the "queue this event" step part of the same database transaction as the data change itself — so they either both commit or both roll back. When you write the invalidation event to a regular table in your database, inside the same transaction as the actual data update, that's called the Transactional Outbox pattern. It closes the gap because there's no longer a moment when the data is committed but the invalidation event isn't — they share the same atomic commit boundary.

The outbox pattern is elegant because it borrows atomicity from the database itself. The outbox table is in the same database as the data. The same transaction that writes the new price also writes the invalidation event. If the transaction commits, both are durable. If it rolls back, neither exists. A separate poller (running on a schedule or a CDC stream of the outbox table itself) reads pending rows and fires the cache deletes. Because the poller runs independently, a temporary Redis outage just delays processing — it doesn't lose events.

Fix 4: Distributed Lock on Key Repopulation

For systems where even a millisecond of stale data after a delete is unacceptable, a distributed lock can prevent the concurrent-reader race. When a cache miss occurs, the reader acquires a lock on the key before querying the database. Other readers waiting for the same key get a cache-miss response and briefly spin or return a fallback value. Once the lock holder repopulates the cache and releases the lock, all subsequent readers hit the cache. This is complex to implement correctly and adds latency to cache misses, so it's only used when the dual-write race is genuinely causing incidents.

Redis Commands: DEL, UNLINK, and SCAN

Redis provides three commands you'll use in explicit-purge implementations. Each does something slightly different, and picking the right one matters at scale.

# DEL: synchronous delete. Blocks the Redis event loop until the key is freed. # Safe for small keys. On a large hash (100k fields), it can block for milliseconds. DEL product:42 # UNLINK: asynchronous delete. Returns immediately; Redis frees memory in the background. # Prefer UNLINK for large keys or high-frequency purge workloads. UNLINK product:42 # SCAN + pattern: iterate over keys matching a glob pattern WITHOUT blocking. # NEVER use KEYS in production — KEYS blocks while scanning the entire keyspace. SCAN 0 MATCH "product:42:*" COUNT 100 # Returns a cursor + a batch of matching keys. Call again with the returned cursor until cursor = 0. # Practical multi-key purge in Python (using redis-py): # cursor = "0" # while cursor != 0: # cursor, keys = redis.scan(cursor=cursor, match="product:42:*", count=100) # if keys: # redis.unlink(*keys)

The critical lesson here: never use KEYS in production. The KEYS command scans the entire Redis keyspace in a single blocking operation. On a Redis instance with millions of keys, this can freeze the server for hundreds of milliseconds, effectively causing a brief outage for every application that uses that Redis instance. SCAN does the same job incrementally — it returns a cursor and a small batch of keys per call — so it spreads the scanning work across many small, non-blocking steps.

When Explicit Purge Is the Right Choice

Explicit purge is the right default for data that changes infrequently but is read many times per second, where any staleness is costly. User account settings, product catalog prices, permission configs, feature flags — these change rarely but are read constantly, and the consequences of serving a stale value are either a confusing user experience or a business error. For these types, the complexity of maintaining purge logic is worth the consistency guarantee.

It becomes the wrong choice when: the set of cache keys that reference any given piece of data is hard to enumerate (fan-out), when the write frequency is so high that the cache is being purged faster than it can be repopulated (cache thrashing), or when the writer service doesn't know which cache keys exist (service isolation). For those cases, CDC-driven invalidation (Section 9) or surrogate keys (Section 11) are better fits.

Explicit purge deletes cache entries on write, giving near-zero staleness — but the dual-write race can poison the cache with stale data when a concurrent reader repopulates it after the delete. The three main fixes are: retry queues for failed deletes, the transactional outbox to make the invalidation event atomic with the write, and distributed locks to serialize cache repopulation. Redis UNLINK is preferred over DEL for large keys; SCAN is mandatory over KEYS for pattern-based invalidation.

Section 8

Write-Through — Synchronous Co-Updates

Write-through is a different philosophy from explicit purge. Instead of deleting the cache on a write, you update both the cache and the database in the same write operation. Every write goes through the cache: the application writes the new value to the cache first (or alongside), then writes to the database. When the next read comes in, the cache already has the correct value — no repopulation, no cache miss, no staleness at all.

The intuition: you treat the cache as a synchronous write target, not just a read shortcut. Reads from the cache are always fresh because every write also hit the cache. The cache is never a "stale copy" — it's a simultaneous copy.

The diagram shows the write-through flow. Both the cache and the database receive the write. Reads always hit the cache and always get the current value — because the last write already updated it. There's no staleness window because there's no "write to DB, then separately update cache later" gap. They're updated together.

The Hidden Cost: Doubled Write Latency

Write-through sounds perfect on paper, but it comes with a mandatory tax: every write now takes at least as long as the slowest of the two writes — typically the database write, which might be 5–20 ms. But you're also now waiting for the cache write to complete, and in a naive synchronous implementation these two writes happen sequentially: write cache → write DB → return to caller. That's two round trips in the write path where before there was one.

You can parallelize them — issue both writes concurrently — but now you have a consistency problem: what if the cache write succeeds but the database write fails? The cache holds the "new" value that was never durably committed. The right answer is to write to the database first, then update the cache; if the cache write fails, the cache is just stale for a moment (a miss on next read will fix it). But that ordering collapses back into the same race conditions as explicit purge.

The Failure Mode Tree: Cache as a Hard Dependency

The failure mode tree illustrates the dilemma cleanly. When the cache is healthy, write-through works perfectly. When the cache goes down, you face an impossible choice: either you block all writes (making the cache a hard dependency that can take your entire write path offline), or you fall back to DB-only writes (accepting that the cache is now stale, which means write-through's consistency promise is broken and you're effectively back to needing purge logic or TTLs). Neither option is graceful. This is why write-through is not the universal answer it might initially appear to be.

The Cache-Pollution Problem

There's a second, quieter problem: write-through caches every write, whether or not the written item will ever be read from the cache. A batch import of 500,000 product records that nobody will ever browse individually fills the cache with cold data — evicting warm data that was actually being served to users. Write-through works best when the write rate is modest and when you can reasonably expect each written item to be read soon afterward. User profiles, shopping carts, session data — these are good candidates. Large bulk imports, log writes, event streams — bad candidates.

When Write-Through Is the Right Tool

Write-through earns its place when you have a write-mostly workload where every write will genuinely be read soon, and where the read latency guarantee is strict enough that you can't afford even a single cache miss. The canonical examples are user profile updates (the user will immediately see their own profile after updating it), shopping cart mutations (the user is about to view the cart), and feature flag updates (every server will read the new value on the next request cycle).

The write-through sweet spot: user-facing write-then-immediately-read patterns. If the user changes their display name and then navigates to their profile page, write-through ensures they see the new name without a round-trip to the database. If you're importing 50,000 product records at 2 a.m., write-through is just wasting cache space.

Write-through updates both the cache and the database in the same write operation, eliminating staleness entirely — but it doubles write latency, makes the cache a hard dependency of the write path, and pollutes the cache with data that may never be read. It's the right choice for write-then-read patterns where every write is soon followed by a read of the same item: user profiles, shopping carts, feature flags.

Section 9

CDC-Driven Invalidation — Event-Sourced Truth

Both explicit purge and write-through require the writer to be responsible for cache management. The application that changes the data also has to know which cache keys to delete or update. This coupling is the root cause of most invalidation bugs in production — a new feature adds a cache layer, the writer isn't updated to purge it, and now you have stale data with no mechanism to fix it.

There's a clever way out: instead of making the writer responsible for cache management, you tap into the database's own internal change log and treat every row change as an event you can react to. The application just writes to the database like normal; a separate listener notices each change and fires the matching cache invalidations. When you do this — capturing every insert, update, and delete as a stream of events derived from the database's own journal — that's called Change Data Capture (CDC). The database already records every change it makes to disk (this is how it survives crashes — it's called the Write-Ahead Log in Postgres, or the binary log in MySQL). CDC tools tail this log and publish a stream of every row change to a message bus like Kafka. A cache-invalidation subscriber reads the stream and fires deletes for any key affected by each change.

The Architecture: Tail the Log, Publish to a Stream

The architecture diagram shows the key insight. The application only ever writes to the database — it has no cache logic at all. Debezium (a Kafka Connect plugin) tails the Postgres WAL and publishes every row change to a Kafka topic. A separate cache-invalidation service subscribes to that topic and fires cache.delete() for every affected key. If you add a new cache layer next month, you just add a new subscriber to the Kafka topic. The writer service never changes. This is the decoupling win that makes CDC the right choice for large systems where many services cache the same underlying data.

Debezium Connector Config — Real Syntax

Debezium is the most widely deployed CDC tool for relational databases. It runs as a Kafka Connect plugin and is configured with a JSON connector definition. Here's a real Postgres connector config:

{ "name": "products-cdc-connector", "config": { "connector.class": "io.debezium.connector.postgresql.PostgresConnector", "database.hostname": "postgres.prod.internal", "database.port": "5432", "database.user": "debezium", "database.password": "${DB_PASSWORD}", "database.dbname": "ecommerce", "database.server.name": "ecommerce", "table.include.list": "public.products,public.inventory", "plugin.name": "pgoutput", "slot.name": "debezium_products", "publication.name": "debezium_publication", "topic.prefix": "db", "transforms": "unwrap", "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState", "transforms.unwrap.drop.tombstones": "false", "heartbeat.interval.ms": "30000" } }

Walking through the key fields: table.include.list tells Debezium which tables to monitor — only changes to products and inventory will be published (you don't want to publish changes from every table in the DB). slot.name is the Postgres replication slot — Postgres uses this to track how far Debezium has read the WAL, so no events are missed even if Debezium restarts. plugin.name: pgoutput selects the built-in Postgres feature that translates raw WAL entries into clean change events — Postgres has shipped this feature since version 10, and "pgoutput" is just its internal name. The ExtractNewRecordState transform flattens the event envelope so your consumer sees a simple before/after record rather than the raw Debezium envelope. The resulting Kafka topic name will be db.public.products.

The CDC Tooling Landscape

Debezium + Kafka Connect

The most battle-tested open-source CDC stack. Debezium connectors exist for Postgres (via logical replication / pgoutput), MySQL (via binlog), MongoDB (via change streams), SQL Server, Oracle, and others. Kafka Connect handles connector lifecycle, offset tracking, and fault tolerance. The Kafka topic gives you a replayable, ordered stream of every change — consumers can replay from the beginning if they fall behind or are added later.

Best for: self-managed infrastructure, mixed-database environments (Postgres + MySQL + MongoDB side by side), teams already running Kafka.

Postgres Logical Replication (direct)

Postgres has native logical replication built in since version 10. You can subscribe to a publication directly in application code using pg_logical or pgoutput, without Debezium. This removes Kafka from the stack, which simplifies operations but means you lose Kafka's replay and fan-out capabilities. A good choice when you have a single cache consumer and want to minimize infrastructure.

Best for: single-consumer CDC, teams that want to avoid Kafka complexity, Postgres-only setups.

MySQL Binlog (direct or via Debezium)

MySQL's binary log is the equivalent of Postgres's WAL for replication purposes. You can read it directly with libraries like python-mysql-replication or via Debezium's MySQL connector. binlog_format=ROW must be set (MySQL's default is STATEMENT, which logs SQL statements rather than row-level changes — CDC requires ROW format to see exact before/after values).

Best for: MySQL/MariaDB systems needing row-level change events.

AWS Database Migration Service (DMS)

AWS DMS can run in CDC mode, streaming changes from RDS, Aurora, or on-premise databases to Kinesis, SQS, S3, or another database. It abstracts away the replication slot management and connector configuration — at the cost of being AWS-specific and having less flexibility in event transformation. DMS CDC is a practical choice when you're already on AWS and want managed infrastructure.

Best for: AWS-native stacks, teams that want managed CDC without running Kafka.

MongoDB Change Streams

MongoDB has native change streams built in since version 3.6, backed by the oplog (operations log). collection.watch() returns an async cursor that yields every insert, update, delete, and replace in real time. Unlike SQL CDC, you can filter change streams server-side, reducing network traffic. Debezium also has a MongoDB connector if you want the Kafka fan-out.

Best for: MongoDB-native stacks needing real-time invalidation without external CDC tooling.

The Hard Parts: Ordering Across Shards

CDC sounds like it solves everything, but it introduces its own hard problems. The most challenging is ordering. In a single-shard database, the WAL is a totally ordered sequence — every change has a position, and changes are replayed in that order. But in a sharded database (Vitess, CockroachDB, Citus, or a manually sharded MySQL setup), each shard has its own WAL. Changes to different shards arrive at your Kafka consumer independently, with no global ordering guarantee.

The ordering diagram shows the problem. A cross-shard operation (moving a product from one shard to another, for example) produces a DELETE on shard 1 and an INSERT on shard 2, both at the same logical time. These events land in separate Kafka partitions, which have no ordering guarantee between them. The cache invalidator may process the INSERT (repopulating the key) before the DELETE (clearing it), leaving the cache with the old entry briefly. For most applications this sub-second window is acceptable. For applications that require strict consistency (financial ledgers, inventory counts), cross-shard CDC requires explicit transaction markers or a distributed transaction coordinator — both of which add significant complexity.

Exactly-Once Delivery and Schema Evolution

Two more operational challenges in CDC pipelines. First: when you'd like every change event to be processed exactly one time — not skipped, not duplicated — that's called exactly-once delivery. Kafka can give you this, but only with careful configuration: idempotent producers and transactional consumer groups. The default Kafka configuration is at-least-once (events may be replayed on consumer restart), so your invalidation handler should be safe to run twice — calling cache.delete(key) twice is harmless, but calling cache.set(key, value) twice can be dangerous if the second call uses a stale value.

Second: schema evolution. When you add a column to the products table, the Debezium event schema changes. Consumers that parse event fields by position (rather than name) will break the moment the new column lands. Schema Registry (part of the Confluent Platform) manages schema versions and validates compatibility before publishing, so downstream consumers fail loudly at deploy time rather than silently dropping events in production.

CDC vs. explicit purge — the deciding question: Does the writer always know all the cache keys that reference the data it's changing? If yes, explicit purge is simpler. If no (fan-out, service isolation, future-proofing), CDC is the better choice because it doesn't require the writer to know anything about the cache.

CDC-driven invalidation tails the database's internal change log (WAL or binlog) and publishes row changes to a stream (usually Kafka). Cache invalidators subscribe to that stream and fire deletes automatically — the writer service has zero cache logic. This decoupling is the key win: new cache consumers can be added without touching the writer. The hard problems are ordering across shards, exactly-once semantics, and schema evolution. Tools: Debezium + Kafka Connect (most flexible), Postgres logical replication (simpler, single consumer), AWS DMS (managed), MongoDB change streams (native).

Section 10

Versioned Keys & Generational Caching â€” Make Invalidation Free

Here's a question: what if you never had to delete a cache entry at all? What if "invalidation" was just a side-effect of naming cache keys differently after each update â€” and the old entries quietly aged out via TTL on their own?

That's the idea behind versioned keys. Instead of storing data at a stable key like product:42, you embed a version number in the key: product:42:v7. When the product changes, you bump the version. The new data lives at product:42:v8. The old entry at product:42:v7 is now orphaned â€” nobody will ever request it again (because the version counter has moved on), so it ages out naturally when its TTL expires. You never had to fire a DELETE. The old key just becomes invisible.

How Version Tracking Works

The version number has to live somewhere â€” typically in the database alongside the entity, or in a lightweight metadata key in Redis. Here's the pattern:

# On WRITE: increment version in DB, then write new cache entry (old key auto-orphaned) def update_product(product_id: int, new_data: dict): # Increment version atomically in the DB version = db.execute( "UPDATE products SET version = version + 1, data = %s WHERE id = %s RETURNING version", (json.dumps(new_data), product_id) ).fetchone()["version"] # Write at the new versioned key (old key is now orphaned â€” will TTL out) cache.setex(f"product:{product_id}:v{version}", ttl=3600, value=json.dumps(new_data)) # No delete needed. product:{product_id}:v{version - 1} will expire on its own. # On READ: fetch the current version from the DB (or a version-tracking Redis key), then read data def get_product(product_id: int) -> dict: # Get current version (this can also be cached with a short TTL) version = db.execute( "SELECT version FROM products WHERE id = %s", (product_id,) ).fetchone()["version"] cache_key = f"product:{product_id}:v{version}" cached = cache.get(cache_key) if cached: return json.loads(cached) # Cache miss: fetch from DB, populate at versioned key data = db.execute("SELECT * FROM products WHERE id = %s", (product_id,)).fetchone() cache.setex(cache_key, ttl=3600, value=json.dumps(dict(data))) return dict(data)

The walkthrough: on a write, the database atomically increments the version and returns the new value. The application writes the new data at the new versioned key. The old key is implicitly abandoned â€” nobody holds a reference to it, so it will expire via TTL. On a read, the application first looks up the current version (from DB or a short-lived Redis key), constructs the versioned cache key, and reads from it. If the version was just bumped by a write, this will be a cache miss and the fresh value will be fetched from the database.

Versioned Key vs. Purge Timeline

The timeline makes the pattern clear. Before the update, all readers look up product:42:v6 and find it in cache. When the update fires at T=10s, a new product:42:v7 entry is written. From that moment, all readers look up v7. The v6 entry stays in memory until its TTL expires at T=3610s â€” silent eviction, zero developer effort, zero race conditions. No DELETE was ever fired.

Global Generation Bumps â€” Invalidate Whole Categories at Once

Per-entity versioning solves the "one item changed" case. But sometimes you need to invalidate an entire category of cache entries at once â€” a global price override affects every product, a CSS bundle update affects every page, a configuration change affects every API response that includes config data. Per-entity versioning requires bumping the version on every item individually â€” expensive if you have millions.

The solution: a global generation counter. Every cache key for a given category includes the current generation: cache:gen42:product:123. When the category changes, you increment the generation counter from 42 to 43 in a single atomic operation. Every key in that category instantly becomes unreachable â€” they all have gen42 in them, but the generation pointer now says 43. Old entries expire via TTL. This is what Rails fragment caching uses with its cache_version mechanism.

The generation bump is elegant. A single INCR generation_key command in Redis atomically invalidates the entire category. Old generation keys are still in memory and will expire via TTL â€” there's a temporary memory bump while both generations coexist, but this is usually acceptable. The trade-off: you must always read the generation counter before constructing a cache key, adding a Redis round-trip to every read. For read-heavy systems, the generation counter itself can be cached locally in application memory with a very short TTL (1â€“5 seconds), making it near-zero cost.

Memory overhead is the main cost. With versioned keys, old entries don't disappear immediately â€” they coexist with the new entries until their TTLs expire. If your TTL is 1 hour and you update a product 60 times per hour, you might have up to 60 orphaned cache entries per product consuming memory. Set TTLs short enough that orphaned entries don't pile up, or use UNLINK cleanup for high-churn items.

The real win: versioned keys eliminate the dual-write race entirely. There is no DELETE that can interleave with a concurrent reader's SET, because there is no DELETE. The versioning mechanism makes invalidation purely additive â€” you write a new key, never delete an old one. The only coordination required is the version pointer, which is a single atomic read-modify-write.

Versioned keys embed a version number in the cache key itself. On update, the version increments and the new value is written at the new key. Old keys are orphaned â€” unreachable but harmless â€” and expire via TTL. This eliminates the dual-write race because no DELETE is ever fired. Global generation bumps extend the idea to entire categories: one atomic increment makes all old keys unreachable simultaneously. The main cost is temporary memory overhead while old and new versions coexist.

Section 11

Surrogate Keys & Cache Tags â€” Many-to-Many Invalidation

All the strategies so far assume you can enumerate the cache keys that reference a given piece of data. But what happens when one piece of data â€” a product, a user, a category â€” appears in dozens or hundreds of different cached responses, and you can't predict in advance which ones those are?

Imagine a product with ID 42. It appears in: the product detail page cache, the category listing for "laptops," the search results for "ultrabook," the "you might also like" recommendations on several other product pages, the sitemap cache, and the homepage featured-products section. When the product's price changes, all of them need to be invalidated. Calling cache.delete() one by one requires knowing every key in advance â€” and that knowledge goes stale every time a new feature adds a new cache layer.

The Idea: Tag Every Cached Entry at Write Time

Surrogate keys (called "cache tags" in many frameworks) invert the problem. Instead of trying to enumerate cache keys at invalidation time, you annotate each cached response at write time with the set of "things it depends on." When one of those things changes, you say "invalidate everything tagged with this thing" â€” and the cache system handles the fan-out.

Here's the concrete form: when you cache the laptop category page, you tag it with category:laptops, product:42, product:99, and product:104 â€” because those are the products on that page. When product 42's price changes, you call cache.purge_tag("product:42"). The cache automatically finds and deletes every entry tagged with product:42, however many there are.

The fan-out diagram shows the power of the pattern. A single purge_tag("product:42") call reaches across every cache entry that included product 42, regardless of the cache key names or how many there are. The sitemap entry â€” not tagged with product:42 â€” is untouched. Only entries that actually depend on this product are cleared. This is precise, automatic, and scales to thousands of entries.

Implementation: How Tags Are Stored in Redis

The cache system needs a data structure that maps from tag to the set of cache keys bearing that tag. In Redis, this is typically a Set per tag: tag:product:42 is a Redis Set whose members are the cache keys tagged with that product. On write, you add the cache key to each tag set. On purge, you read the tag set and delete every listed key, then clear the set itself.

# Writing a cache entry with tags def cache_set_with_tags(key: str, value: str, ttl: int, tags: list[str]): pipe = cache.pipeline() pipe.setex(key, ttl, value) for tag in tags: pipe.sadd(f"tag:{tag}", key) # add key to the tag's member set pipe.expire(f"tag:{tag}", ttl + 3600) # keep tag set alive at least as long as entry pipe.execute() # Invalidating all entries tagged with a given tag def purge_tag(tag: str): tag_key = f"tag:{tag}" keys = cache.smembers(tag_key) if keys: pipe = cache.pipeline() pipe.unlink(*keys) # async-delete all tagged entries pipe.delete(tag_key) # remove the tag index pipe.execute() # Example usage: cache_set_with_tags( key="page:category:laptops", value=render_category_page("laptops"), ttl=3600, tags=["category:laptops", "product:42", "product:99", "product:104"] ) # Later, when product 42 is updated: purge_tag("product:42") # Deletes "page:category:laptops" + every other key tagged with product:42

The pipeline approach matters for performance: all the SADD operations to the tag sets, plus the main SETEX, happen in a single round-trip to Redis. On purge, reading the tag set and deleting all tagged keys also happens in a pipeline. For very large tag sets (thousands of keys), the tag set and all tagged keys should be on the same Redis shard (using hash tags in Redis Cluster: tag:{product:42}) to avoid cross-shard round-trips.

Fastly Surrogate Keys: Native CDN Tag Invalidation

This pattern is so valuable that CDN providers have built it natively. Fastly calls them Surrogate Keys, implemented via a response header:

HTTP/1.1 200 OK Content-Type: text/html Cache-Control: max-age=3600 Surrogate-Key: product-42 category-laptops featured-products Surrogate-Control: max-age=86400

Your origin server adds the Surrogate-Key header to the response. Fastly strips it before sending to the browser (users never see it), but stores the tag-to-object mapping internally at every edge node. When product 42 changes, you fire a purge via the Fastly API:

# Purge all Fastly-cached responses tagged with "product-42" # Propagates to all edge nodes globally in ~150ms curl -X POST "https://api.fastly.com/service/$FASTLY_SERVICE_ID/purge/product-42" \ -H "Fastly-Key: $FASTLY_API_KEY" # Batch purge (multiple tags at once): curl -X POST "https://api.fastly.com/service/$FASTLY_SERVICE_ID/purge" \ -H "Fastly-Key: $FASTLY_API_KEY" \ -H "Content-Type: application/json" \ -d '{"surrogate_keys": ["product-42", "category-laptops"]}'

Fastly propagates the purge to all edge nodes globally in roughly 150 milliseconds. This is how large e-commerce platforms can cache HTML pages at the CDN edge with long TTLs (for high cache efficiency) while still invalidating instantly when a price changes. Varnish â€” the open-source HTTP cache that Fastly is based on â€” implements the same pattern with its ban mechanism: ban req.http.X-Tag ~ "product-42" removes all objects whose tag header matched that pattern.

The lifecycle diagram shows why this is powerful at the CDN layer. You can set a 24-hour TTL on HTML pages â€” meaning extremely high cache efficiency and almost no origin traffic â€” while still being able to purge specific pages within 150 milliseconds when their content changes. You get the performance of a long TTL with the consistency of an explicit purge. That combination is hard to achieve without surrogate keys, because wildcard CDN purges by URL pattern are slow and imprecise.

Surrogate keys (cache tags) annotate cached responses with their logical dependencies at write time. On invalidation, you purge by tag â€” the cache fans out the delete to every entry bearing that tag. Fastly implements this natively via the Surrogate-Key response header and a Purge API, propagating invalidations globally in ~150ms. Redis-based implementations store a Set of cache keys per tag and delete all members on purge. The pattern is essential for any system where one data change can affect many cached responses at once.

Section 12

Lease & Time-Bounded Consistency â€” Hybrid Approaches

Every strategy so far is a pure form: pure TTL, pure purge, pure write-through, pure CDC, pure versioning, pure tags. Real production systems rarely use any strategy in isolation. The most robust caching systems combine two or more strategies in layers, so that a failure in one layer is caught by the other.

The most common hybrid is deceptively simple: TTL as fallback + explicit purge as fast path. The purge handles the normal case â€” an update fires, the cache is immediately cleared. The TTL handles the abnormal case â€” the purge failed silently, the application crashed, the network dropped the delete command. In the normal case, staleness is near-zero. In the worst case, staleness is bounded by the TTL. You get the consistency guarantee of explicit purge with the safety net of TTL.

The Math: Hybrid Staleness Guarantee

Let's quantify it. Suppose you set a 60-second TTL on every cache entry and also fire explicit purges on every write. In the normal case (purge succeeds): staleness â‰ˆ 0 seconds â€” just the time the delete takes to propagate, typically under a millisecond within the same region. In the worst case (purge fails for any reason): staleness = at most 60 seconds. Compare this to pure TTL (always up to 60 seconds of staleness) or pure purge (potentially infinite staleness if a purge is silently dropped and no TTL is set). The hybrid gives you the best of both worlds. The 60-second fallback TTL costs you nothing in the normal path â€” it's a safety valve that fires only when something goes wrong.

Stale-While-Revalidate: Background Refresh

The second hybrid pattern is stale-while-revalidate, standardized in RFC 5861. The idea: when a cached entry's TTL expires, instead of blocking the current request on a database fetch, serve the stale entry immediately and trigger a background refresh in parallel. The next request will get the fresh value.

This is particularly valuable for high-read endpoints where simultaneous cache misses are expensive. Without stale-while-revalidate, the moment a popular cache entry expires, every concurrent request races to the database â€” this is the thundering herd problem. Stale-while-revalidate collapses this to one background query while all concurrent requests are served the stale value.

HTTP stale-while-revalidate in Practice

HTTP/1.1 200 OK Content-Type: application/json Cache-Control: max-age=60, stale-while-revalidate=300 # Meaning: # - max-age=60: serve from cache for 60s (fully fresh) # - stale-while-revalidate=300: after 60s, entry is stale BUT still serve it # immediately for up to 300 MORE seconds while a background fetch refreshes it # - After 360s total: entry is hard-expired â€” next request blocks on full re-fetch

Walking through the directive: for the first 60 seconds, every request gets the cached response instantly â€” no database hit. From 60 to 360 seconds, requests still get a fast response (the stale value), but the cache asynchronously fetches a fresh copy from the origin in the background. The first request that arrives after 60s triggers the background fetch; subsequent requests get the stale value while the fetch is in flight (usually milliseconds); once the fetch completes, the fresh value replaces the stale one. After 360 seconds, if the background fetch never succeeded, the entry is fully expired and the next request blocks on a synchronous re-fetch.

The timeline makes the value clear. Without stale-while-revalidate, every request that arrives just after T=60s blocks on the database â€” if 500 requests arrive simultaneously at T=61s, all 500 hit the database in parallel. With stale-while-revalidate, all 500 get the stale value immediately and one background fetch runs. The database sees one query instead of 500. The 500 users see a response that's at most a few seconds old, well within the 300-second SWR window.

Application-Level Lease in Redis

NGINX, Varnish, and CDNs have stale-while-revalidate built in. In application-level caches (Redis), you implement the equivalent pattern yourself using a "must-revalidate-after" timestamp embedded in the cache value alongside the data.

import asyncio, json, time LEASE_TTL_SECONDS = 60 # how long before we background-refresh HARD_TTL_SECONDS = 3600 # hard expiry (fallback if background refresh never runs) async def get_with_lease(key: str, fetch_fn) -> dict: raw = await cache.get(key) if raw: entry = json.loads(raw) now = time.time() if now < entry["revalidate_after"]: return entry["data"] # hot path: return immediately # Lease expired but hard TTL hasn't: serve stale, kick off background refresh asyncio.create_task(background_refresh(key, fetch_fn)) return entry["data"] # return stale â€” don't wait for refresh # Hard TTL expired or cold start: must fetch synchronously data = await fetch_fn() await cache_set_with_lease(key, data) return data async def background_refresh(key: str, fetch_fn): data = await fetch_fn() await cache_set_with_lease(key, data) async def cache_set_with_lease(key: str, data: dict): entry = { "data": data, "revalidate_after": time.time() + LEASE_TTL_SECONDS, } await cache.setex(key, HARD_TTL_SECONDS, json.dumps(entry))

The walkthrough: on a cache hit where the revalidate_after timestamp hasn't passed â€” the hot path â€” return the data immediately, sub-millisecond. If the timestamp has passed but the Redis key still exists (hard TTL hasn't fired): serve the stale data immediately and fire a background coroutine to refresh it. The current request doesn't wait. If the Redis key is gone (hard TTL expired or cold start): fetch synchronously. The hard TTL is the safety net â€” even if the background refresh never ran, stale data will be refetched from the database within HARD_TTL_SECONDS.

The Worst-Case Staleness Tree

The staleness tree makes the lesson unavoidable: a TTL is not optional even when you're using explicit purge. If the purge succeeds â€” which it will in the vast majority of cases â€” staleness is near-zero and the TTL never fires. But if the purge fails for any reason, the TTL is the only mechanism that prevents stale data from persisting indefinitely. The TTL is cheap â€” just a number stored alongside the cache entry â€” and has zero cost in the normal path. Skipping it to save code complexity is a mistake that will eventually produce a production incident.

The Production Decision Matrix

After working through all six strategies, here's the practical decision framework. Think of your data along two axes: how often it changes (write frequency) and how costly staleness is (business impact). Most data falls into one of four quadrants:

High write frequency + High staleness cost

Examples: real-time inventory counts, live prices on a trading platform, financial balances.

Strategy: CDC-driven invalidation (the writer can't enumerate all the keys; decouple invalidation from writes entirely) + short TTL as fallback. For the most sensitive data, skip the cache on reads and hit the database directly â€” some data is simply too volatile to cache safely.

Low write frequency + High staleness cost

Examples: product prices (change during sales, not constantly), permission configurations, feature flags.

Strategy: Explicit purge (you can enumerate the keys) + TTL as fallback (60â€“300 seconds). Add surrogate keys if the data fans out to many cache entries.

High write frequency + Low staleness cost

Examples: article view counts, non-critical recommendation scores, aggregated analytics.

Strategy: TTL-only with stale-while-revalidate. The writes are too frequent for explicit purge to keep up, and the staleness window is acceptable. Use SWR to prevent thundering herds on expiry.

Low write frequency + Low staleness cost

Examples: blog post content, documentation pages, static configuration data.

Strategy: Long TTL at CDN edge (hours or days) + surrogate key purge for when updates happen. Or versioned keys for content that changes rarely but must be instantly consistent when it does change.

The Non-Negotiable Rules

Rule 1: Always set a TTL on every cache entry, even when using explicit purge. TTL is your safety net â€” it ensures stale data can never persist forever if a purge is dropped.

Rule 2: Use stale-while-revalidate for high-read endpoints where a thundering herd on TTL expiry would be disruptive. It eliminates the stampede at zero cost in the normal case.

Rule 3: For data that fans out to many cache entries, use surrogate keys or CDC â€” don't try to enumerate all the keys at purge time.

Rule 4: The consistency guarantee you need must be driven by the business requirement, not by what's easiest to implement. Start with the acceptable staleness window; pick the strategy that enforces it.

The most robust caching systems combine strategies in layers: explicit purge handles the fast path, TTL handles the fallback, and stale-while-revalidate prevents thundering herds on expiry. The hybrid's worst-case staleness is bounded by the TTL (e.g., 60 seconds) â€” versus infinite for pure purge with a missed delete, or always-TTL-max for pure TTL. The four-quadrant decision matrix maps most data types to the right strategy: CDC or bypass for high-frequency + high-cost; purge + TTL for low-frequency + high-cost; TTL + SWR for high-frequency + low-cost; long TTL + tag purge for low-frequency + low-cost.

Section 13

Consistency Models for Caches — From Strong to Eventual

When you add a cache to your system, you are not just adding speed — you are also silently choosing a consistency model. A consistency model answers the question: "after a write happens, how quickly and how reliably can a reader see the new value?" Most engineers pick a caching strategy without consciously thinking about this, and then they're surprised when users see stale data in ways they didn't expect.

There are four consistency models that matter most for caches, ranging from the strictest guarantee to the most relaxed:

Strong Consistency

This means every read, from any node, always sees the most recently committed write. If you write a new price to the database, the very next read of that price — from any user, on any server — returns the new value. There is no window of staleness at all.

In a cached system, this is only achievable if every write synchronously updates both the database and every cache replica before returning success. In practice, this requires write-through caching with a lock that prevents reads from being served while the update propagates. The cost is high: write latency goes up, the system can't tolerate cache node failures gracefully, and multi-region deployments make synchronous propagation impractically slow. Strong consistency in a cache is rare in production — it only makes sense for financial ledger reads or authentication tokens where any staleness is unacceptable.

Read-Your-Writes Consistency

A softer guarantee: after you write something, you will always see your own write when you read. Other users might still see stale data, but the user who performed the write will never see an older version of the thing they just changed. This is the consistency model most users implicitly expect — "I just updated my profile picture, surely I see the new one." It's achievable without making writes globally synchronous. The two common techniques are sticky session routing and per-user cache bypass — for N seconds after a write, that user's reads skip the cache and go straight to the database. After the window passes, the cache has caught up and reads can resume normally.

Monotonic Reads

This means if you've seen version N of a value, you will never see a version older than N. You might not see version N+1 immediately, but you'll never go backward — no flip-flopping between old and new values. This happens in distributed caches when one replica has been updated but another hasn't, and your requests are load-balanced between them: request 1 hits the updated replica (sees new value), request 2 hits the stale replica (sees old value), and so on. Monotonic reads prevents that regression. It's achievable by either routing all reads for a given user to the same cache shard, or by attaching a read token (a version or timestamp) that the cache respects.

Eventual Consistency

The most common model in practice: the cache will converge to the current truth within a bounded time window, but there are no guarantees about any individual read during that window. TTL-based caching is the purest form — the cache is stale for up to TTL seconds, then automatically refreshes. Event-driven invalidation (CDC + Kafka) is also eventual — the invalidation message will arrive, but it might be delayed by milliseconds to seconds depending on lag.

Eventual consistency is fine for a huge range of use cases (blog post content, product descriptions, user preferences, analytics dashboards) and terrible for a different range (live inventory, payment states, access control lists). The key word is "bounded" — you should always be able to say "our cache is eventually consistent with a maximum staleness of X seconds under normal conditions." If you can't put a number on X, you don't really have a consistency model — you just have a cache.

The diagram above shows the four consistency levels from strictest at top (most guarantees, highest cost) to most relaxed at bottom (fewest guarantees, lowest cost). Most production systems actually use a mix — eventual consistency for product content, read-your-writes for user profiles, and a cache bypass for payment states. The art is matching each data type to the consistency level the business actually needs.

The key insight from this comparison: explicit purge without a fallback TTL can produce infinite staleness if the purge request fails and is never retried. In practice, every invalidation strategy except write-through should have a backstop TTL — not as the primary mechanism, but as insurance against bugs, network failures, or missed events in the invalidation pipeline.

Choosing a Model: The Decision Framework

The staleness budget conversation: For each data type in your system, ask the product owner: "If this data is wrong for N seconds, what is the worst that can happen?" If the answer is "a customer pays the wrong price" or "an unauthorized user gets access," N should be 0 — use write-through or cache bypass. If the answer is "the blog post shows an old headline," N can be 300 or more — TTL is fine.

Every caching decision implies a consistency model. Strong consistency requires synchronous write-through and is rarely practical at scale. Read-your-writes guarantees users see their own changes without full synchrony. Monotonic reads prevents version regression across replicas. Eventual consistency — the default for TTL and CDC — is fine for most data types but must be paired with an explicit staleness budget the business has agreed to. Always pair best-effort invalidation strategies with a backstop TTL.

Section 14

The Thundering Herd & Stampede Mitigation

Here's a failure pattern that catches every team off guard the first time they experience it in production. You have a popular product page — maybe a celebrity's concert tickets — and its cache entry expires at 11:59:58 AM. At noon, a promotional email lands in 500,000 inboxes. Within two seconds, 40,000 people click the link simultaneously. All 40,000 requests arrive at your application layer. All 40,000 do a cache lookup. All 40,000 see a miss. All 40,000 fire a database query. Your database, which comfortably handles 2,000 queries per second, receives 40,000 queries in under a second. It falls over. This is the thundering herd problem, also called a cache stampede.

The Math Behind the Pain

The burst is predictable once you do the arithmetic. If you have 10,000 requests per second hitting a cache entry for a popular item, and that entry has a 60-second TTL, then every 60 seconds you have a potential stampede. The severity depends on how long a single database query takes.

If each DB query takes 50 ms, then in the 50 ms window between the first miss and the first response being cached, all requests that arrive will also miss. At 10,000 req/s and 50 ms latency, that's 500 simultaneous DB queries before the first one returns. For a single hot key.

And the worst part: the DB is now under load, so those queries now take 200 ms, meaning 2,000 simultaneous queries, so the load goes even higher — a positive feedback loop that melts the database. The stampede doesn't just spike load; it makes the spike worse by making each query slower.

Notice the positive feedback loop: the stampede increases DB load, which increases query latency, which means more requests pile up before the first response is cached, which increases load further. The only escape is external — the DB falls over and stops accepting connections, at which point the stampede also stops (because every request errors out immediately). That is not a recovery plan.

Fix 1: Per-Key Locking with singleflight

The most elegant fix is singleflight. The idea is beautifully simple: instead of letting every concurrent miss fire its own DB query, you let only the first one through, and every other request blocks and waits for that first request to finish. When the DB result returns and is stored in the cache, all waiting requests get the same result simultaneously. The database sees one query instead of 40,000.

Here's how singleflight looks in Go — where the standard library includes it in the golang.org/x/sync/singleflight package:

Go singleflight — per-key request deduplication

import ( "context" "golang.org/x/sync/singleflight" "github.com/redis/go-redis/v9" ) var ( rdb *redis.Client group singleflight.Group // one group per cache layer ) // GetProduct returns a product, using singleflight to collapse stampedes. // If 10,000 goroutines call GetProduct("1234") simultaneously after a cache miss, // only ONE fires the DB query; the other 9,999 wait and receive the same result. func GetProduct(ctx context.Context, id string) (*Product, error) { cacheKey := "product:" + id // Step 1: fast path — try the cache first (no singleflight needed here) if val, err := rdb.Get(ctx, cacheKey).Result(); err == nil { return deserialize(val), nil } // Step 2: slow path — singleflight.Do collapses all concurrent misses into 1 v, err, _ := group.Do(cacheKey, func() (interface{}, error) { // Only ONE goroutine executes this function. // All others block here until it returns. p, dbErr := db.QueryProduct(ctx, id) if dbErr != nil { return nil, dbErr } // Populate the cache so future reads are fast. rdb.Set(ctx, cacheKey, serialize(p), 5*time.Minute) return p, nil }) if err != nil { return nil, err } return v.(*Product), nil }

The third return value from group.Do is a shared boolean — it's true if the result was shared with other callers. You can use this for metrics: if 99% of calls have shared=true, your stampede protection is working hard. If it's always false, your cache hit ratio is high and you're never stampeding — good news either way.

Java's Caffeine library solves the same problem with refreshAfterWrite — entries are refreshed asynchronously before they expire, so there's never a cold miss for a hot key:

Java Caffeine — refreshAfterWrite to prevent cold misses

LoadingCache<String, Product> cache = Caffeine.newBuilder() .maximumSize(10_000) .expireAfterWrite(10, TimeUnit.MINUTES) // hard eviction after 10 min .refreshAfterWrite(5, TimeUnit.MINUTES) // async reload at 5 min while still serving .build(key -> db.queryProduct(key)); // CacheLoader: called on miss or refresh // refreshAfterWrite means: // - At minute 5: cache serves the old value AND fires an async DB query in the background // - At minute 5+ε: cache serves the fresh value (no cold miss) // - At minute 10: if the async refresh failed, the entry is hard-evicted on next access // Result: hot keys NEVER produce a synchronous DB call under normal operation

The key difference from singleflight: refreshAfterWrite is proactive — it refreshes before expiry rather than reacting to a miss. This is better for ultra-hot keys. The trade-off is that entries may be slightly stale for up to the refresh interval, which is acceptable for the use cases where you'd have stampede risk (high-traffic, non-financial data).

Fix 2: Probabilistic Early Expiration (XFetch)

What if you don't control the cache library and can't add singleflight? The XFetch algorithm (Vattani, Chierichetti & Lowenstein, "Optimal Probabilistic Cache Stampede Prevention," VLDB 2015) solves the stampede problem statistically: instead of refreshing exactly when TTL expires, each request has a small random probability of triggering an early refresh, with that probability increasing as the entry approaches expiry. The math ensures that on average, exactly one refresh happens per expiry cycle, spread across the population of requesters.

The formula for deciding whether to proactively refresh on a given read:

In the XFetch formula, β is a tuning parameter (larger = more aggressive early refresh), δ is the time the last recomputation took (so expensive DB queries trigger earlier refreshes), and rand() is a random number. The result: as remaining TTL shrinks, the probability of triggering a refresh on any given read grows, distributing the refresh work across multiple clients and avoiding the hard expiry cliff.

Fix 3: Jittered TTLs

The simplest fix that costs almost nothing: add a small random value to every TTL. Instead of setting all product pages to exactly 300 seconds, set each one to 300 + rand(0, 30) seconds. This staggers expiry times across the key space so that at any given second, only a small fraction of entries expire. No single second sees a mass-expiry event. The downside is that it adds up to 10% more staleness on average — for most content, this is completely acceptable.

// Without jitter: ALL product pages expire at exactly the same offset from when // they were populated, so a cold deployment fills them all, and they all expire // simultaneously 5 minutes later — stampede city. rdb.Set(ctx, key, value, 5*time.Minute) // With jitter: each entry expires at a slightly different time. // The jitter range should be ~10-20% of the base TTL. jitter := time.Duration(rand.Int63n(int64(30 * time.Second))) rdb.Set(ctx, key, value, 5*time.Minute+jitter)

Fix 4: Request Coalescing at the Proxy Layer

Request coalescing (sometimes called request queuing or grace mode) is a feature built into reverse proxies like Varnish and many CDNs. When a cache miss occurs on a popular resource, instead of immediately passing all waiting requests to the origin, Varnish sends exactly one request upstream and serves the other clients from the stale copy (if available) or queues them. When the single upstream response returns, all queued clients receive it simultaneously. This is singleflight implemented at the HTTP proxy layer — with no code changes in your application. Varnish calls this req.is_bgfetch and it's configurable in VCL.

The thundering herd problem occurs when a popular cached entry expires and all concurrent readers simultaneously miss and hit the database. The four mitigations — singleflight per-key locking, Caffeine refreshAfterWrite, the XFetch probabilistic early-refresh algorithm, and jittered TTLs — each attack the problem differently. Singleflight and refreshAfterWrite are the most robust for origin caches. Jittered TTLs are the easiest win. Request coalescing at a proxy layer provides stampede protection without any application code changes.

Section 15

Edge Invalidation — CDN & Browser Caches

Everything we've discussed so far deals with caches close to your application — Redis, Memcached, in-process caches. But there are two more cache layers that are further from your control and genuinely harder to invalidate: CDN edge caches and browser caches. The further you are from the origin, the harder invalidation becomes — and browser caches are almost impossible to forcibly clear once content is in them.

CDN Invalidation: Fast But Not Instant

CDNs like Cloudflare, Fastly, and AWS CloudFront distribute your content to edge nodes around the world — there might be 200+ edge locations, each holding their own copy of your cached content. When you change a piece of content, you need that change to reach all 200+ nodes. That propagation takes time:

Cloudflare: purge API typically propagates globally in about 150ms–2 seconds for most requests. The CDN documentation describes this as near-instant, and for practical purposes, it usually is.
Fastly: offers instant purge (sub-second) via surrogate keys — their architecture is specifically designed for high-frequency programmatic purging. This is why media companies like The New York Times and GitHub use Fastly.
AWS CloudFront: invalidations typically complete in under 2 minutes (well over 90% of edge locations within seconds), but can occasionally take longer for all global locations to clear. CloudFront also charges per invalidation path (first 1,000 invalidations per month are free, then $0.005 per path).

The practical implication: if you push a critical bug fix or a pricing error correction, you need to account for the CDN propagation window. If CloudFront takes a couple of minutes and your flash sale price went live incorrectly at noon, users in some AWS regions may see the wrong price for up to that window even after you fire the invalidation API call.

The takeaway: no CDN provides truly instant global invalidation. "Instant purge" means "faster than TTL expiry," not "zero propagation time." For content where any staleness after a purge is unacceptable, you need cache-busting URLs instead of relying on purge propagation.

The Cloudflare Purge API

Here's how to call the Cloudflare Cache Purge API from your application after a content update. The key point: purging by URL is simple but expensive at scale; purging by cache tag (which maps to Fastly's surrogate keys) is the production pattern for large catalogs.

# Purge specific URLs — simple but only works for known URLs curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \ -H "Authorization: Bearer {api_token}" \ -H "Content-Type: application/json" \ -d '{"files": ["https://example.com/products/1234", "https://example.com/category/electronics"]}' # Purge by cache tag — powerful: tag every response with Cache-Tag header, # then a single purge call invalidates ALL cached responses for that tag. # Example: tag product pages with "product-1234", category pages with "category-electronics" curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \ -H "Authorization: Bearer {api_token}" \ -H "Content-Type: application/json" \ -d '{"tags": ["product-1234"]}'

To make tag-based purging work, you need to add the Cache-Tag header to every response your origin sends. Cloudflare reads this header and builds an internal index of which cache entries have which tags, so a single API call can purge thousands of cached entries at once — all product pages, all category pages, all search results that include a specific product.

Browser Caches: The Uninvalidatable Problem

Here is the hardest reality of edge caching: you cannot forcibly clear content from a user's browser cache. Once a user has downloaded app.js and their browser has cached it according to the Cache-Control headers you sent, you have no mechanism to reach into their browser and delete that file. HTTP does not have a "push invalidation" for browsers. The only way to make them download a new version is to change the URL.

This is why the production pattern for static assets (JavaScript, CSS, images) is content-hash URLs. Your build tool (webpack, Vite, esbuild) automatically renames every file to include a hash of its contents: app.a3f9c2.js. You then serve it with the longest possible TTL and a special marker that tells browsers "this file at this URL will never change, don't even bother checking back" — that marker is called immutable, and it lives in the Cache-Control header:

# For content-hashed assets: use the longest TTL, mark as immutable. # "immutable" tells the browser: this file will never change at this URL. # The browser won't even send a conditional request (If-None-Match) during the TTL. Cache-Control: max-age=31536000, immutable # For the HTML entry point (which references the hashed assets): short TTL or no-cache. # The HTML must always be fresh so browsers get the updated asset URLs. Cache-Control: no-cache # What "no-cache" actually means (confusingly): "always revalidate before using". # It does NOT mean "don't cache at all". The browser caches the file but sends a # conditional GET (If-None-Match or If-Modified-Since) on every use. # If the server says "304 Not Modified", the browser uses its cached copy. Cache-Control: no-cache

This diagram captures the full cache-busting pattern. The HTML page (index.html or the server-rendered entry point) must always be fresh — served with no-cache — because it is the manifest of which versioned asset URLs to load. The assets themselves can have year-long TTLs because the URL changes on every deploy. "Invalidation" for browser-cached assets means deploying a new URL, not sending a purge signal.

Cache-Control: no-cache does NOT mean "don't cache". This is one of the most common misconceptions in web development. no-cache means "cache this, but always revalidate with the server before serving it." If you actually want the browser to never cache something, use Cache-Control: no-store. The naming is backwards from what you'd expect, and it has caused production incidents in teams that set no-cache on sensitive data thinking they were disabling caching.

CDN invalidation is orders of magnitude faster than TTL expiry but is not instantaneous — propagation takes milliseconds to minutes depending on the CDN. Browser caches are nearly impossible to forcibly invalidate; the production pattern is content-hash URLs that change when content changes, paired with long TTLs on assets and short TTLs on the HTML entry point. Understanding the difference between Cache-Control: no-cache (always revalidate) and no-store (don't cache at all) prevents a class of common production mistakes.

Section 16

Bug Studies — When Invalidation Goes Wrong

Theory is useful. Production incidents are better teachers. Each of the following bug studies is a realistic scenario — composite of patterns that recur at real engineering teams. The goal isn't to memorize the specific bugs, but to build the mental model that lets you recognize the class of failure before it happens in your own system.

Bug 1 — Dual-Write Race: Stale Flash-Sale Price for 4 Hours

Incident: An e-commerce platform launched a flash sale at 2 PM. The product team updated prices in the admin CMS. Within 60 seconds, angry customers and support tickets started arriving: the product detail pages still showed pre-sale prices. Investigation revealed that the cache had been correctly invalidated for 80% of products — but for a specific product category, prices were stale for over 4 hours.

What Went Wrong

The application used a dual-write pattern: on price update, write the new price to the database, then call cache.delete(productKey). The race condition: between the DB write and the cache delete, another application server was serving a request for that product. It got a cache miss (the old entry had just been evicted by an LRU pressure), fetched the new price from the DB, and populated the cache. Then the original write's cache.delete executed — but it deleted the freshly written cache entry containing the correct price. A second request populated the cache again from the DB correctly. So far, so good.

The real problem came one layer up: the product detail page was also cached as a rendered HTML blob (not just the price), at a different cache key. The code that invalidated the rendered page had a subtle bug: it only invalidated the key for the primary locale. The product existed in 7 locale-specific cache entries. Six of them continued serving stale HTML for the full TTL — 4 hours. The price field was correct in cache; the rendered page was wrong, and nobody noticed because the small "Buy" widget on the page was rendered separately with fresh data.

# BUG: invalidation is scattered and incomplete. # The cache key for the product price is generated in one place, # but the invalidation code was written later in a different service # and doesn't enumerate all the keys that need to be cleared. def update_product_price(product_id: int, new_price: float): db.execute("UPDATE products SET price=%s WHERE id=%s", new_price, product_id) # Invalidate the price cache key (correct) cache.delete(f"product:{product_id}:price") # Invalidate the rendered page — BUG: only handles default locale. # In reality there are 7 locale-specific keys. cache.delete(f"product:{product_id}:page:en-us") # fr, de, ja, ko, pt, zh keys are never cleared. They stale for TTL.

# FIX 1: Enumerate all locale keys explicitly (defensive but fragile long-term). SUPPORTED_LOCALES = ["en-us", "fr", "de", "ja", "ko", "pt", "zh"] def update_product_price(product_id: int, new_price: float): db.execute("UPDATE products SET price=%s WHERE id=%s", new_price, product_id) cache.delete(f"product:{product_id}:price") for locale in SUPPORTED_LOCALES: cache.delete(f"product:{product_id}:page:{locale}") # FIX 2 (better): use surrogate cache tags so a single tag clears all locale pages. # On every product page response, add: Cache-Tag: product-{id} # Then invalidation becomes one call regardless of how many locale variants exist. def update_product_price(product_id: int, new_price: float): db.execute("UPDATE products SET price=%s WHERE id=%s", new_price, product_id) cache.delete(f"product:{product_id}:price") # Purge ALL cached representations for this product by tag. cache.purge_by_tag(f"product-{product_id}") # clears all locale keys at once

Lesson: Invalidation logic that is physically separate from cache-key generation will drift over time. The fix is co-location: either generate all cache keys in one place so invalidation can enumerate them, or use surrogate tags that group related keys so you invalidate a concept (all representations of product 1234) rather than individual keys.

How to Spot: If your code contains cache.delete(f"{id}:something:en-us") — a delete that hardcodes one variant of a multi-variant key — that is almost certainly a bug. Search your codebase for all cache key prefixes and verify that invalidation covers every variant.

Bug 2 — Cache Stampede: Celebrity Profile Expires During Peak Traffic

Incident: A social media platform set a 5-minute TTL on user profile data. A high-profile user posted at the exact second their profile cache entry expired. In the subsequent 200 ms, 8,000 concurrent requests for that profile all saw a cache miss and fired database queries. The user's follower count query was expensive (not indexed correctly). The database connection pool was exhausted in under 100 ms. Downstream services that depended on the same DB started timing out, and the incident cascaded.

What Went Wrong

This is the canonical thundering herd scenario described in Section 14, but with a real cascading failure pattern layered on top. The root causes: (1) a single TTL for all profiles regardless of their traffic level — a profile with 50 million followers is not the same as a profile with 50 followers; (2) no singleflight protection on the profile fetch path; (3) an unindexed count query that was cheap under normal load but became the chokepoint under stampede load.

// BUG: flat TTL for all profiles — a profile with 50M followers gets the same // TTL as a profile with 50 followers. No stampede protection. func GetProfile(ctx context.Context, userID string) (*Profile, error) { key := "profile:" + userID if val, err := cache.Get(ctx, key); err == nil { return deserialize(val), nil } // All 8K goroutines land here simultaneously when the hot entry expires. p, err := db.QueryProfile(ctx, userID) // expensive COUNT(*) inside if err != nil { return nil, err } cache.Set(ctx, key, serialize(p), 5*time.Minute) // flat TTL for everyone return p, nil }

var profileGroup singleflight.Group // FIX: singleflight + traffic-proportional TTL (popular profiles stay cached longer) // + jitter to prevent synchronized expiry across popular profiles. func GetProfile(ctx context.Context, userID string) (*Profile, error) { key := "profile:" + userID if val, err := cache.Get(ctx, key); err == nil { return deserialize(val), nil } // singleflight: collapse all concurrent misses into exactly 1 DB query. v, err, _ := profileGroup.Do(key, func() (interface{}, error) { p, dbErr := db.QueryProfile(ctx, userID) if dbErr != nil { return nil, dbErr } // Traffic-proportional TTL: popular profiles cache longer. ttl := baseTTL(p.FollowerCount) // e.g., 5min for 0–1K, 30min for 1M+ jitter := time.Duration(rand.Int63n(int64(ttl / 10))) // ±10% jitter cache.Set(ctx, key, serialize(p), ttl+jitter) return p, nil }) if err != nil { return nil, err } return v.(*Profile), nil } func baseTTL(followers int64) time.Duration { switch { case followers > 1_000_000: return 30 * time.Minute case followers > 10_000: return 15 * time.Minute default: return 5 * time.Minute } }

Lesson: Flat TTLs are a design smell. Hot keys need longer TTLs (or refreshAfterWrite), singleflight protection, and their underlying queries must be indexed for the load a stampede would generate. A stampede reveals latent query performance issues that were invisible at normal load.

How to Spot: If your cache fetch path has a db.Query() call without a singleflight guard, it is stampede-vulnerable for any key that gets popular. Add singleflight to every high-traffic cache miss path — it costs nothing when there's no stampede and saves your DB when there is.

Bug 3 — Versioned-Key Memory Leak: Old Versions Never Evicted

Incident: A team adopted versioned cache keys (user:1234:v{N}) to eliminate dual-write races. Six months later, Redis memory usage was 15× higher than expected and growing monotonically. The culprit: every time a user profile was updated, a new versioned key was written. The old versioned keys had no TTL and were never explicitly deleted. Redis was holding every version of every profile ever updated.

What Went Wrong

Versioned keys are an elegant invalidation pattern — bump the version number, and old keys become orphaned (unreachable by any reader). But "unreachable" does not mean "deleted." Without an explicit TTL on every versioned key, Redis holds them forever. The version number came from a counter in the database that only incremented. After six months with millions of updates, there were hundreds of millions of orphaned cache entries consuming memory. The system never crashed — Redis was eventually evicted under memory pressure using an LRU policy — but by then, active cache entries were also being evicted, destroying the cache hit ratio.

# BUG: versioned keys are written but old versions are never cleaned up. def get_user(user_id: int) -> dict: version = db.get_cache_version(user_id) # e.g., version = 42 key = f"user:{user_id}:v{version}" if (cached := cache.get(key)) is not None: return cached user = db.get_user(user_id) cache.set(key, user) # NO TTL — this entry lives forever in Redis return user def update_user(user_id: int, data: dict): db.update_user(user_id, data) db.increment_cache_version(user_id) # version goes from 42 → 43 # Old key "user:{id}:v42" is now unreachable — but still occupies memory. # It will never be cleaned up unless Redis runs LRU eviction.

# FIX: always set a TTL on versioned keys. The TTL acts as the GC mechanism — # even if the old version is never explicitly deleted, it will expire. # The TTL should be long enough that reads spanning a write don't miss, # but short enough that orphaned versions don't accumulate. def get_user(user_id: int) -> dict: version = db.get_cache_version(user_id) key = f"user:{user_id}:v{version}" if (cached := cache.get(key)) is not None: return cached user = db.get_user(user_id) cache.set(key, user, ex=3600) # TTL = 1 hour: old versions expire within 1hr return user def update_user(user_id: int, data: dict): db.update_user(user_id, data) db.increment_cache_version(user_id) # Optional: explicitly delete the old key for faster cleanup old_version = db.get_cache_version(user_id) - 1 cache.delete(f"user:{user_id}:v{old_version}")

Lesson: Versioned keys need TTLs. The TTL is the garbage collector for orphaned versions. Without it, versioned caching leaks memory at a rate proportional to your write throughput. Rule: every key written to Redis must have a TTL, no exceptions.

How to Spot: Search your Redis client calls for set(key, value) without a TTL argument. Any call that writes a key without an expiry is a memory leak waiting to happen. Some teams enforce this with a linter rule on their Redis wrapper class.

Bug 4 — Silent CDN Purge Failure: Deprecated Feature Flag Cached at Edge for Days

Incident: A team used their CDN to cache a feature-flag JSON file (/flags.json) at the edge for performance. They deprecated a feature by setting a flag to false in the file and fired a CDN cache purge. The purge API returned HTTP 200. But for three days, some users saw the feature still enabled. Investigation revealed the purge API call had been successful for the API gateway — but the payload contained a typo in the file path (/flag.json instead of /flags.json). The typo meant the actual file at /flags.json was never purged. The CDN TTL was 7 days.

What Went Wrong

CDN purge APIs return success when the request was accepted, not when the purge actually propagated to all edge nodes. And a successful HTTP 200 from a purge API does not mean the path you purged was correctly matched — a typo produces a successful no-op purge. The team had no post-purge verification: they never fetched the resource from multiple edge locations to confirm the new version was being served. The 7-day TTL meant the error went undetected for 3 days before someone manually tested from a different region.

# BUG: purge is fire-and-forget. No verification, no retry logic, # and a subtle typo means the wrong URL is purged. def deploy_feature_flags(flags: dict): write_flags_file(flags) # update flags.json on origin response = cdn.purge_url("/flag.json") # TYPO: should be /flags.json if response.status_code == 200: logging.info("CDN purge successful") # deceptively succeeds # No post-purge check, no retry, no verification

import time, requests CORRECT_PATH = "/flags.json" EDGE_VERIFY_URLS = [ "https://edge-us-east.example.com/flags.json", "https://edge-eu-west.example.com/flags.json", "https://edge-ap-south.example.com/flags.json", ] def deploy_feature_flags(flags: dict): new_version = flags.get("_version") # embed a version in the JSON write_flags_file(flags) # update origin response = cdn.purge_url(CORRECT_PATH) # correct path, using constant if response.status_code != 200: raise RuntimeError(f"CDN purge request failed: {response.status_code}") # Post-purge verification: poll edge nodes until they serve the new version. # Wait up to 30 seconds for global propagation. deadline = time.time() + 30 while time.time() < deadline: confirmed = 0 for edge_url in EDGE_VERIFY_URLS: try: r = requests.get(edge_url, timeout=3) data = r.json() if data.get("_version") == new_version: confirmed += 1 except Exception: pass if confirmed == len(EDGE_VERIFY_URLS): logging.info("CDN purge verified on all edges") return time.sleep(2) raise RuntimeError("CDN purge did not propagate within 30s — manual intervention required")

Lesson: CDN purge APIs report whether your request was accepted — not whether the content was actually cleared. For critical invalidations (feature flags, security configurations, pricing rules), always follow up a purge with a verification step that fetches the resource from multiple edge locations and checks for the expected new version.

How to Spot: Any code that calls a CDN purge API and checks only the HTTP response code (not the actual served content) is unverified. Critical caches — anything related to access control, pricing, or feature flags — should have automated post-purge verification built into their deployment pipeline.

Four recurring invalidation failure patterns: partial invalidation from scattered cache-key logic (fix: surrogate tags); cache stampede from flat TTLs on hot keys (fix: singleflight + traffic-proportional TTL); versioned-key memory leaks from missing TTLs (fix: always set a TTL); and silent CDN purge failures from no post-purge verification (fix: poll edge nodes for the new version). Each failure points to a structural issue — co-location of key generation and invalidation logic, stampede protection on every high-traffic miss path, mandatory TTLs on all writes, and verified CDN deployments.

Section 17

Common Misconceptions About Cache Invalidation

Cache invalidation misconceptions are unusually dangerous because they're not obviously wrong — each one sounds plausible at first reading, and some are even technically partially true in narrow circumstances. Understanding why each one is false (not just knowing it's false) gives you the mental model to catch them in your own code reviews and design discussions.

This is the most common cache misconception and it's half-right: TTL does eventually invalidate stale data. But TTL is not an invalidation strategy — it's a staleness bound. If your TTL is 5 minutes, you're saying "I accept that for up to 5 minutes after a data change, readers may see the old value." For an e-commerce product description or a blog post, that's likely fine. For a product price during a flash sale, an inventory count, or a permission check, it's a production incident waiting to happen.

The confusion comes from conflating "the stale data will eventually be gone" with "invalidation is handled." They're different guarantees. Real invalidation strategies — explicit purge, write-through, CDC — eliminate the staleness window entirely or bound it to milliseconds rather than minutes. TTL is what you fall back to when you can't or don't want to implement a real invalidation strategy, with full awareness of the maximum staleness you're accepting.

Dual-write is explicitly not atomic. Writing to the database and then deleting the cache key are two separate operations with no transaction boundary between them. Between the DB write and the cache delete, any number of other operations can happen: another server can read the old cached value, populate a new cached entry from the database, and have that fresh entry deleted by your arriving cache.delete call (the classic delete-then-repopulate race). Or the cache delete can fail entirely — a network timeout, a Redis restart, a transient error — while the DB write succeeded, leaving the cache permanently stale until the TTL expires.

The correct mental model: dual-write gives you best-effort cache invalidation with eventual consistency in the normal case. For true atomicity, you need either write-through with a distributed transaction (impractical in most systems) or CDC, which reads invalidation events directly from the database's durable change log rather than from application code.

This is a naming disaster baked into the HTTP specification. Cache-Control: no-cache does NOT mean "do not cache." It means "cache this response, but you must revalidate it with the server (send a conditional GET) before using it on every subsequent request." The browser stores the response locally but always asks the server "is this still current?" before serving it. If the server responds with 304 Not Modified, the browser uses its cached copy. Only if the server sends a new response does the browser discard the old one.

If you actually want the browser to never cache a response and never store it locally, the correct directive is Cache-Control: no-store. The difference matters enormously for sensitive content: a page served with no-cache is stored on disk in the browser cache and viewable in developer tools even after the user navigates away. A page served with no-store is never written to disk.

CDN providers use the word "instant" to mean "much faster than TTL expiry" — not "zero latency globally." Even Cloudflare's fastest purge propagation involves the purge signal traveling to hundreds of edge locations around the world. In practice, Cloudflare propagates purges globally in milliseconds to a few seconds for most requests. Fastly is similarly fast. But "a few seconds" is not the same as "instant."

The gap matters in two scenarios: (1) time-sensitive content (a security advisory, a deprecated API response) where even a 2-second window of stale content at a heavily-trafficked edge node could be meaningful; (2) verification — if you fire a purge and immediately check whether the new content is live from a remote region, you may still see the old content for a few seconds. The correct pattern is to fire the purge and then poll until edge nodes confirm the new version is live, with a timeout and alerting for cases where propagation takes unexpectedly long.

Write-through gives strong consistency for a single-node, single-cache deployment: every write updates both the database and the cache atomically, so any read from that single cache node sees the most recent write. But "strong consistency" breaks down the moment you have multiple cache replicas. If you have Redis Cluster with 3 primary nodes, a write-through operation updates the database and the primary node responsible for that key — but the other two replica nodes may not be updated synchronously. A read that hits a different replica sees a stale value.

Additionally, write-through doesn't help if the two writes (DB + cache) are not truly atomic. If the DB write succeeds but the cache write fails (Redis timeout, network partition), you have strong consistency in neither direction: the DB has the new value but the cache has the old one. To handle this, you need either a retry with idempotency or a rollback of the DB write — both of which significantly complicate the write path.

Summary: write-through + a single cache node = strong consistency under normal conditions. Write-through + distributed cache = complex partial failure modes that look like strong consistency until they don't.

CDC reads from the database's write-ahead log (WAL) or binary log, which does record writes in order — for a single database node. The ordering breaks down when your database is sharded or replicated. If product ID 1234 and product ID 5678 live on different database shards, their change events enter different WAL streams, flow through different Kafka partitions, and may be processed by different consumer instances. The consumer for product 1234 might process its event before the consumer for product 5678 processes its event, or vice versa — there is no cross-shard ordering guarantee.

Within a single shard, ordering is preserved because all writes for that shard flow through one WAL. Across shards, you can only guarantee that events for the same partition key (same shard) are ordered relative to each other. If your cache invalidation logic assumes a global ordering of events, it will produce incorrect results under cross-shard workloads.

This is the opposite problem from the memory leak described in Bug Study 3: some engineers avoid versioned keys entirely because they worry about orphaned versions filling up memory. The worry is valid — but the solution is TTLs, not abandoning the pattern.

Versioned keys with TTLs are actually quite memory-efficient in practice. The number of live versions of any key at any moment is at most 2: the current version (being actively read) and the previous version (recently orphaned, expiring within one TTL window). If you write version 42 at time T, the previous version 41 expires at most one TTL window later. After that, only version 42 occupies memory. The memory overhead is at most 2× per key, and only during the transition window. Compare this to surrogate tags (which require the CDN or cache to maintain an index of all keys per tag) or full record locking (which holds locks for the duration of every write). Versioned keys with TTLs are among the leanest invalidation patterns when implemented correctly.

Seven recurring misconceptions, each with a structural root cause: TTL is a staleness bound, not an invalidation strategy; dual-write is not atomic; Cache-Control: no-cache means revalidate, not don't cache; CDN purge is fast but not instantaneous; write-through gives strong consistency only for a single cache node; CDC ordering is per-shard, not global; and versioned keys with TTLs are memory-efficient, not wasteful.

Section 18

Practice Exercises — Build Your Intuition

Reading about cache invalidation builds vocabulary. Actually designing invalidation systems — making trade-off decisions under constraints — builds the intuition you need to do it correctly under time pressure in an interview or in production. Work through each exercise before reading the answer.

An e-commerce platform wants to launch a flash sale at exactly 12:00:00 PM. Product prices will be updated in the database 60 seconds before launch (at 11:59:00). The product detail pages are cached in Redis with a 10-minute TTL and also served via a CDN with a 5-minute TTL. The business requirement: no customer should see the old price after 12:00:00 PM. Design an invalidation strategy that meets this requirement.

Think about the layered cache problem: you have TWO caches (Redis + CDN) that both need to be invalidated. A strategy that works for one may not work for the other. Also consider the timing: the prices are written 60 seconds before the launch time, not at launch time. Can you use that window?

Answer: A single strategy (TTL or explicit purge alone) won't work here because you have two cache layers and a hard deadline. Recommended approach — layered invalidation with a pre-warm:

At 11:59:00: Write new prices to the database with a goes_live_at = 12:00:00 timestamp. Do NOT yet update the cache.
At 11:59:30: Fire CDN cache purge for all product URLs involved in the sale. This gives the CDN 30 seconds to propagate before launch. For CloudFront (up to 60s), this would be tight — consider Fastly for time-sensitive purges.
At 11:59:55: Fire Redis explicit deletes for all affected product keys. The 5-second window before launch ensures the cache is empty at 12:00:00, and the next read will fetch the new price from the database.
Application logic: The price query should check goes_live_at <= NOW(), so any cache-bypassing read after 12:00:00 returns the new price even if somehow a stale entry survives.

Backup TTL: Set all flash-sale product pages to a 60-second TTL starting at 11:58:00. Even if a purge fails, the cache expires within 60 seconds of launch. This is the backstop, not the primary mechanism. Why not TTL-only? A 10-minute TTL means entries written at 11:50 won't expire until 12:00, which is exactly at the launch boundary — too tight for a reliable launch, and any cache write between 11:50 and 11:59 would push expiry past noon.

You have a cache entry with a 60-second TTL. You also have an explicit purge mechanism, but it only delivers purge messages with 99% reliability (1% of purge messages are lost due to network failures). Calculate the expected maximum staleness window under normal operation, and under the failure case.

The "maximum staleness window" is determined by which mechanism catches a stale entry last. If both the TTL and the purge work correctly, staleness ends when the purge arrives. If the purge is lost, staleness ends when the TTL expires. Combine these with the 1% failure rate to compute expected staleness.

Answer: Normal operation (purge delivered — 99% of updates): The purge arrives within milliseconds to seconds of the write. Let's say purge delivery takes at most 2 seconds in the 99th percentile. For 99% of updates, the staleness window is 0–2 seconds. Failure case (purge lost — 1% of updates): When the purge is lost, the entry stays in cache until the 60-second TTL expires. The staleness window is 0–60 seconds. Expected staleness per update:
E[staleness] = 0.99 × 2s + 0.01 × 60s = 1.98s + 0.6s = ~2.6 seconds expected maximum staleness. What this means in practice: At 100 updates per second, you'll have roughly 1 update per second where the purge is lost, with a 60-second staleness window. At any given moment, roughly 60 × 1 = 60 cache entries are serving stale data from lost purges, each for up to 60 seconds. The key insight: A 1% purge failure rate does not mean "1% staleness." It means "1% of updates have TTL-length staleness instead of purge-speed staleness." For a 60-second TTL, this is 30× worse than the normal case per affected update. Always pair best-effort purge with a backstop TTL short enough that purge failures are acceptable. If 60 seconds of staleness is unacceptable, reduce the TTL rather than improving purge reliability.

A teammate shows you this pseudocode for a cart update. Identify the dual-write race condition and explain under what timing it produces a stale cache.

def update_cart(user_id: int, item_id: int, quantity: int): db.execute("UPDATE cart_items SET qty=%s WHERE user_id=%s AND item_id=%s", quantity, user_id, item_id) cache_key = f"cart:{user_id}" cache.delete(cache_key)

Think about what another concurrent request could be doing between the db.execute and the cache.delete. What happens if a read request fires in that tiny window?

Answer — The Race Window: Here's the sequence of events that produces a stale cache:

Thread A executes db.execute — cart is updated in the database (qty = 5).
Thread B fires a cache read for the same user's cart. Gets a cache miss (either the entry expired or was already deleted).
Thread B reads from the database — correctly gets qty = 5. Writes {"item_id": X, "qty": 5} to the cache.
Thread A executes cache.delete — deletes the fresh entry Thread B just wrote.
Thread C fires a cache read. Gets a cache miss. Reads from the database — correctly gets qty = 5. Writes to cache. Now the cache is correct again.

Wait — that actually recovers? Yes, in this simple case. The real problem is if Thread A's write is not isolated. If Thread A calls db.execute, Thread B reads and caches (getting qty = 3, the old value, from a replica with replication lag), and then Thread A's delete fires — the cache now has qty = 3. The next read after Thread A's delete re-fetches from the DB and gets 5. Net result: a brief window of stale data, not permanent corruption. The more dangerous version: if you do cache-write instead of cache-delete: write new cart to DB, then write new cart to cache — now Thread B can overwrite Thread A's cache entry with a stale read-through, and there's no subsequent delete to clean it up. This is why delete (not update) on the invalidation side is the correct pattern for best-effort consistency.

Design a surrogate cache tag scheme for an e-commerce site with the following page types: product detail pages, category listing pages, search results pages, and a homepage "featured products" carousel. When a product's price changes, which tags need to be purged? When a product is added to a new category, which tags? When a product is discontinued (removed from catalog), which tags?

Think about what data each page type displays. A product detail page shows one product's data. A category page shows many products. A search result page shows products matching a query. The homepage carousel shows "featured" products. Each page should carry all the tags that relate to what it displays, so a purge by any one of those tags clears that page.

Answer — Tag Design: Tags to assign to each page type:

Product detail page for product P: tags = ["product-{P.id}"]
Category listing page for category C: tags = ["category-{C.id}"] (contains many products but you purge the whole page on any product change in that category)
Search results page for query Q: tags = per-product tags for all products in results (["product-{p1.id}", "product-{p2.id}", ...]) — this allows purging all search result pages that contain product P by issuing a single product-P.id tag purge. Alternatively, if the search index is rebuilt on product changes, use ["search-index"] as a coarser tag.
Homepage featured carousel: tags = ["featured"] + tags for each featured product. Purging featured clears the carousel when the featured set changes.

Purge operations:

Price change on product P: purge tag product-{P.id} → clears product detail page + all search result pages containing P + homepage carousel if P is featured.
Product P added to new category C: purge tags product-{P.id} + category-{C.id} → clears old pages for P and the category listing that now includes P.
Product P discontinued: purge tag product-{P.id} → clears all pages mentioning P. Additionally purge all category tags for categories that contained P (or use a broader tag like catalog if discontinuations are rare).

The key insight: assign tags based on what the page displays, not on the page's URL. This ensures that any data change triggers purges on all pages that display that data, regardless of how many there are or what their URLs look like.

Design a function GetOrFetch(key, fetchFn, ttl) that is safe against cache stampedes for hot keys, handles the case where fetchFn returns an error (should not cache errors), and uses jittered TTLs to prevent synchronized expiry across a batch of related keys. Write the implementation in Go or Python.

You need three things working together: (1) a singleflight.Group to deduplicate concurrent misses, (2) error handling that prevents error-caching, (3) jitter added to the TTL before writing to cache. Sketch the flow first: cache hit → return; cache miss → singleflight.Do → call fetchFn → if success, cache with jitter and return; if error, return error without caching.

Answer — Go implementation:

package cache import ( "context" "fmt" "math/rand" "time" "golang.org/x/sync/singleflight" ) type CacheClient interface { Get(ctx context.Context, key string) ([]byte, error) Set(ctx context.Context, key string, val []byte, ttl time.Duration) error } var group singleflight.Group // GetOrFetch is a stampede-resistant cache fetch. // - Checks cache first (fast path, no singleflight overhead on hits) // - On miss: uses singleflight to collapse concurrent callers into 1 fetchFn call // - Errors from fetchFn are returned to all waiters but NOT cached // - TTL is jittered ±10% to prevent synchronized expiry across related keys func GetOrFetch( ctx context.Context, c CacheClient, key string, fetchFn func(ctx context.Context) ([]byte, error), baseTTL time.Duration, ) ([]byte, error) { // Fast path: cache hit — skip singleflight entirely (no lock contention) if val, err := c.Get(ctx, key); err == nil && len(val) > 0 { return val, nil } // Slow path: cache miss — singleflight collapses concurrent callers v, err, _ := group.Do(key, func() (interface{}, error) { // Re-check the cache inside the singleflight — another goroutine // may have populated it while we were waiting for the group lock. if val, err := c.Get(ctx, key); err == nil && len(val) > 0 { return val, nil } // Fetch from origin (DB, API, etc.) data, fetchErr := fetchFn(ctx) if fetchErr != nil { // CRITICAL: do NOT cache errors. An error should never be stored // in the cache — it would serve errors to all readers for TTL seconds. return nil, fmt.Errorf("fetch failed: %w", fetchErr) } // Add ±10% jitter to TTL to prevent synchronized expiry. jitterRange := int64(float64(baseTTL) * 0.1) jitter := time.Duration(rand.Int63n(jitterRange*2) - jitterRange) ttl := baseTTL + jitter if setErr := c.Set(ctx, key, data, ttl); setErr != nil { // Cache write failed: log and continue — the fetch succeeded. // The next miss will just fetch again. Don't fail the request. fmt.Printf("warning: cache set failed for key %s: %v\n", key, setErr) } return data, nil }) if err != nil { return nil, err } return v.([]byte), nil }

Key design decisions explained:

Double-check inside singleflight: after acquiring the group lock, re-check the cache. Between the outer miss and the singleflight execution, another goroutine may have already populated the cache. Without this, you'd fire an extra DB query for every group of concurrent misses, not just one.
Never cache errors: if fetchFn returns an error, return it to all waiting goroutines but don't store it. Caching an error means all readers get the error for TTL seconds even after the underlying problem is fixed.
Jitter is relative to base TTL: jittering by ±10% of the base TTL keeps the staleness properties predictable while spreading expiry events enough to eliminate synchronized expiry across a batch of keys.
Cache write failure is non-fatal: a Redis write failure during the slow path should degrade gracefully (log + continue), not fail the request. The origin response is still valid data for the caller.

Five exercises covering the core invalidation design decisions: layered invalidation for time-critical updates, staleness window calculation under partial purge failure, identifying and reasoning about dual-write race windows, surrogate tag design for multi-page invalidation, and a full stampede-resistant cache layer with singleflight, error-non-caching, and jitter. Working through these builds the problem-solving muscle that transforms invalidation from theory into production-ready decisions.

Section 19

Tuning Invalidation in Production — A Decision Playbook

Theory is nice. Production is different. When you're staring at a new feature ticket that involves cached data, you don't have time to re-derive the whole cache invalidation decision tree. What you need is a repeatable five-step process you can run in your head — or better yet, document in your team's runbook — that takes you from "we have a caching need" to "this is the invalidation strategy, these are the metrics to watch, and this is our alert threshold." That's what this section gives you.

The five steps aren't academic categories. They're the actual sequence an experienced engineer runs: first understand how stale is too stale, then pick a strategy that satisfies that tolerance, then instrument the strategy so you can see when it's failing, then set alerts, then plan for the moment the business changes and your current strategy no longer fits. Let's walk through each step in depth.

The five steps above form a loop, not a one-time decision. The diagram shows each step feeding naturally into the next — and Step 5 explicitly loops back to Step 1, because business requirements are not stable. Let's go deep on each.

Step 1 — Measure the Business's Staleness Tolerance

Before you choose an invalidation strategy, you need a number: how stale is too stale? This is a business question, not a technical one, and engineers often skip it because it feels like someone else's job. It's not. If you don't get this number out of a product manager, you'll either over-engineer (spending engineering time on millisecond-freshness for data that nobody checks more than once a day) or under-engineer (using a 1-hour TTL on payment state).

Ask: "If this data is N minutes old when a user reads it, does anything bad happen?" Iterate N from very large to very small until you find the threshold where "yes, something bad happens." For a blog post, N can be 60 minutes and nothing bad happens. For product pricing during a flash sale, N is 30 seconds. For available seat count on a flight, N is 5 seconds. For a bank balance, N is effectively zero — you cannot serve a stale balance to a customer making a payment decision.

Document this number as your staleness SLA for this data type. It is the ceiling on your TTL and the maximum allowed staleness lag for any event-driven strategy. Everything downstream of Step 1 uses this number.

Step 2 — Choose a Strategy Using the Workload Matrix

With your staleness SLA in hand, you can now use the workload-to-strategy matrix to pick an approach. The matrix has two dimensions: tolerance (how much staleness the business accepts) and workload shape (is this data read far more than it's written, or written heavily?).

The matrix gives you the first-order answer. Top-left (low writes, high tolerance) is the easy case — set a TTL equal to or less than your staleness SLA and move on. Bottom-right (high writes, low tolerance) is the hard case — you need CDC because write-through has too many moving parts at high write volume. The diagonals are the nuanced cases where you layer strategies: TTL as the backstop, purge as the fast path, CDC as the consistent guarantee.

Step 3 — Instrument: The Four Metrics That Matter

A cache invalidation strategy you can't measure is a strategy you can't trust. There are exactly four metrics you need to emit on every cache invalidation path:

Hit ratio — the percentage of reads served from cache vs. falling through to the database. A sudden drop in hit ratio often means your invalidation is too aggressive (keys are being deleted before they'd naturally expire). A suspiciously high hit ratio on a volatile data type might mean your purge pipeline is broken and nothing is being invalidated at all.
Staleness lag — the time delta between when the database row changes and when the corresponding cache key is deleted. This is your most direct measure of whether your invalidation is meeting its SLA. For a TTL strategy, staleness lag can be up to TTL seconds. For CDC-driven invalidation, it should be the end-to-end latency of your Kafka/Debezium pipeline — typically milliseconds to low seconds.
Purge success rate — for explicit purge and CDC-driven strategies, what percentage of intended invalidation operations actually succeed? A Redis `DEL` or `UNLINK` can fail if the Redis node is partitioned, if your application server crashes between DB write and cache delete, or if a deployment is in progress. Track this as a percentage and alert if it drops below 99.9%.
Stampede rate — how many concurrent origin fetches are happening on a single cache key at the same moment? This is the thundering herd signal. If you're seeing five or more concurrent database reads for the same cache key, your expiration is causing stampede events and you need to implement a lease or stale-while-revalidate approach.

Step 4 — Monitor and Alert

Metrics without alerts are decoration. For each of the four metrics above, set alert thresholds tied to your staleness SLA:

Staleness lag exceeds your SLA threshold → page the on-call engineer immediately. This is a consistency breach in progress.
Hit ratio drops more than five percentage points in five minutes → investigate. Either traffic pattern changed or invalidation is broken.
Purge success rate falls below 99.9% → alert. Silent purge failures mean your cache is silently drifting away from truth.
Stampede rate (concurrent DB reads per key) exceeds five → warning. Not yet a crisis but you need to add lease or stale-while-revalidate before it becomes one.

Step 5 — Iterate When Business Needs Change

Your first invalidation design is never your last. The most common triggers for re-running this playbook from Step 1: (a) a regulatory or compliance requirement tightens your freshness SLA from minutes to seconds; (b) traffic grows by an order of magnitude and write-through latency is now noticeable to users; (c) you expand to multiple geographic regions and your staleness lag number now includes cross-region propagation time; (d) a new product feature changes the write pattern of a data type from infrequent to high-frequency. When any of these happen, go back to Step 1 with fresh eyes — don't just patch the existing strategy.

Production invalidation tuning is a five-step repeatable process: measure the business's staleness tolerance, choose a strategy that satisfies it, instrument the four key metrics (hit ratio, staleness lag, purge success rate, stampede rate), set alerts tied to your SLA, and revisit the whole playbook whenever business requirements change. The workload-to-strategy matrix gives you the first-order answer; the metrics tell you whether it's working.

Section 20

Real-World Architectures — How Big Companies Actually Invalidate

Reading about strategies in the abstract is useful. Seeing how real engineering teams implemented those strategies — with all the real-world constraints of existing databases, traffic patterns, and team size — is where the deep understanding happens. Each of the systems below made a specific, documented trade-off. Understanding why they made that choice tells you more about invalidation than any number of theoretical comparisons.

A note on numbers: specific scale figures for internal systems change constantly and are often not publicly precise. We'll name the architectural pattern and the trade-off clearly; we'll stay soft on numbers that aren't from published engineering sources.

Facebook TAO — Write-Through + Async Cross-Region Invalidation

Facebook's TAO (The Associations and Objects) is the caching layer that sits in front of Facebook's social graph database. It was described in a USENIX ATC 2013 paper (Bronson et al., "TAO: Facebook's Distributed Data Store for the Social Graph") and is one of the most cited real-world cache designs. TAO stores objects (users, posts, photos) and associations (friendships, likes, comments) in an in-process cache tier.

TAO uses write-through within a single data center: a write to an object or association updates the database and the TAO cache in the same logical operation, so readers in the same region see strongly consistent data immediately after the write. The hard part is multi-region: Facebook operates data centers on multiple continents, and each has its own TAO cluster. When a write happens in one region, the other regions' TAO caches hold stale copies.

The solution TAO uses is asynchronous invalidation via replication. Writes flow through the primary region, which asynchronously replicates to secondary regions. Each secondary TAO cluster receives the replication event and fires cache invalidation messages to its own cache nodes. The trade-off is explicit: secondary-region reads can be transiently stale by the replication lag, which is typically in the milliseconds-to-low-seconds range under normal network conditions. For a social graph (timeline, friend list, notifications) this level of eventual consistency is acceptable — you might not see a like for a few seconds, but that's fine.

The key lesson from TAO: even at enormous scale, the answer isn't "strong consistency everywhere" — it's "strong consistency where it matters most (within a region) and bounded eventual consistency where the trade-off is acceptable (cross-region)."

Shopify — Rails Fragment Caching with Versioned Keys + Generation Bumps

Shopify runs one of the largest multi-tenant e-commerce platforms in the world, built on Ruby on Rails. Rails has a built-in fragment caching system, and Shopify's use of it is a textbook example of versioned key invalidation.

In Rails fragment caching, every cached HTML fragment or JSON object has a cache key that incorporates the model's updated_at timestamp and a version counter. When a product is updated in the database, Rails automatically computes a new cache key for that product's fragment because the updated_at timestamp changed. The old key is simply never read again — it becomes orphaned and will eventually be evicted by LRU. There's no explicit purge call; the key itself changes, making the old cached version unreachable. This is the generation bump pattern: increment the version, and all old cache entries instantly become dead entries.

For mass invalidation (for example, a merchant deploys a new theme that changes how all their product pages render), Shopify uses a global generation counter scoped to a merchant's store. Bumping this counter invalidates every cached fragment for that merchant in a single atomic operation — no matter how many individual product, collection, or page fragments exist. Without this pattern, invalidating all fragments for a large merchant's store on a theme update would require enumerating and deleting thousands of individual keys, which is both slow and prone to races.

The lesson: versioned keys turn "how do I delete everything?" into "how do I make the old keys unreachable?" — an elegant shift that eliminates whole classes of purge race conditions.

Fastly + GitHub — Surrogate Keys for HTTP Edge Cache Invalidation

Fastly is a CDN that supports a feature called surrogate keys (also called cache tags). GitHub uses Fastly as its CDN layer and uses surrogate keys to manage invalidation of rendered repository pages, commit listings, and file views.

Here's the problem that surrogate keys solve: a single GitHub repository page might be cached at hundreds of Fastly edge nodes globally. If someone pushes a new commit to that repository, GitHub needs to invalidate all the cached versions of the repository's home page, the commit list page, the default branch file tree, and possibly any pages that show the latest commit message. That's many distinct URLs spread across many edge nodes. Without surrogate keys, GitHub would have to enumerate every URL and issue a purge call for each — a maintenance nightmare as the product evolves and new pages are added.

With surrogate keys, GitHub's application server includes a response header like Surrogate-Key: repo:123456 user:789 org:42 when it serves a response. Fastly stores this metadata alongside the cached response. When a push event happens, GitHub issues a single Fastly API call: "purge all edges that carry the tag repo:123456." Fastly propagates this purge to all its edge nodes in seconds. All pages tagged with that repository ID — regardless of their URL structure — are invalidated in one operation.

The lesson: surrogate keys shift the invalidation model from "which URLs need to change?" to "which logical entity changed?" — matching how humans think about data dependencies rather than how HTTP caches store data.

Netflix EVCache — TTL + Best-Effort Delete, Multi-Region Replication

Netflix's EVCache is an open-source distributed caching library (available at github.com/Netflix/EVCache) built on top of Memcached. It was designed for Netflix's scale and multi-region requirements. EVCache takes a pragmatic position on invalidation: use TTL as the primary expiry mechanism, and layer best-effort explicit deletes on top of it on every write.

Every EVCache entry has a TTL — typically set to match the data's business staleness tolerance. When Netflix's backend services write to the database, they also asynchronously fire a delete call to EVCache for the relevant key. The word "asynchronously" is critical: the delete is fire-and-forget. If it fails (because a Memcached node is temporarily unavailable, or the application server crashes after the DB write but before the delete call completes), the TTL is the backstop — the entry expires on its own schedule. This design accepts that a small fraction of delete calls will fail, and relies on TTL to bound the maximum staleness window.

For multi-region deployments, EVCache replicates cache writes and deletes across regions using a batched replication protocol. This means that a delete issued in the US-East region is eventually propagated to EU-West and AP-Southeast. The replication is asynchronous, so cross-region reads can transiently serve stale data during the propagation window — a trade-off Netflix accepts for the latency benefit of serving from a local region cache.

The lesson: combining TTL (the reliable backstop) with best-effort explicit deletes (the fast path) gives you a system that's tolerant of transient failures while still providing near-real-time invalidation in the common case.

Twitter Manhattan — Write-Through with Multi-Tier Cache

Twitter's Manhattan is Twitter's distributed key-value store, described in engineering blog posts as the storage layer behind timelines, direct messages, and other core features. Twitter's cache layer in front of Manhattan uses a write-through model: writes go to both Manhattan and the cache tier simultaneously, so reads always see the latest written value without having to wait for a TTL expiry or an explicit purge event.

The interesting engineering challenge at Twitter's scale is maintaining this write-through guarantee across a multi-tier cache: Twitter uses both an in-process application-level cache (small, very fast, tied to a single host) and a distributed remote cache (Redis/Memcached cluster, shared across many hosts). A write that updates the distributed cache does not automatically update every application host's in-process cache. Twitter handles this by giving the in-process cache a very short TTL — measured in seconds, not minutes — so it acts as a micro-cache that's always nearly fresh, with the distributed cache as the authoritative second tier. The write-through guarantee holds at the distributed tier; the in-process tier is just a performance optimization with a short staleness window.

The lesson: write-through is clean in theory but requires careful thought in a multi-tier cache hierarchy. The solution is to apply write-through at the tier that matters most (the shared distributed cache) and accept short TTL staleness at the performance-optimization tier (in-process).

Architecture Comparison at a Glance

The comparison table makes one thing obvious: every system uses a different primary strategy, but every system also layers a fallback. No production system relies on a single invalidation mechanism alone. The pattern is: "fast path for freshness in the common case, TTL or versioned keys as the backstop for when the fast path fails."

Real production invalidation architectures share a common pattern: they pick a primary strategy matched to their consistency requirement, add a fallback for failure cases, and accept bounded staleness across regions because strong cross-region consistency is too expensive in latency and complexity. Facebook TAO, Shopify Rails, Fastly/GitHub, Netflix EVCache, and Twitter Manhattan each illustrate a distinct point in the strategy design space.

Section 21

Cache Coherence Across Replicas & Multi-Region

So far we've mostly talked about a single cache server and a single application server talking to a single database. That's a useful abstraction for understanding the strategies. But production systems almost never look like that. They have multiple application server instances (horizontal scaling), multiple cache nodes (a Redis cluster or a fleet of Memcached servers), and multiple geographic regions (for latency and fault tolerance). When you add any of these dimensions, the cache invalidation problem multiplies. Let's understand why — and what the solutions look like.

The Multi-Node Problem: Which Cache Node Gets the Delete?

Suppose you have a Redis cluster with six shards. Cache keys are spread across those shards using a math trick: hash the key, and the hash value tells you which shard owns it — so the same key always lands on the same shard, and adding or removing a shard only re-shuffles a small fraction of the keys. That trick is called consistent hashing. When your application issues a DEL user:profile:999 command, the Redis client library routes that command to the correct shard automatically. This is the best case: the delete goes exactly where the key lives.

The problem arises when you have multiple application server instances that each maintain their own in-process cache (a local HashMap or Caffeine cache). When server instance A receives a write for user #999 and updates the database, it can easily delete the key from its own in-process cache. But server instance B through Z still have the stale entry in their local caches. The write on instance A didn't trigger any notification to the other instances.

The solution is a pub/sub fan-out: when instance A performs the invalidation, it also publishes a message to a shared channel that all other instances subscribe to. Each subscriber receives the invalidation message and deletes the key from its local in-process cache. Redis pub/sub makes this straightforward to implement.

The diagram shows the three-step fan-out: the writing server updates the database (step 1), publishes an invalidation message to a Redis pub/sub channel (step 2), and all other application servers — which are subscribed to that channel — receive the message and delete the key from their local in-process caches (step 3). The Redis distributed cache doesn't need a fan-out — it's already shared — but every in-process cache on every host gets notified.

Redis Pub/Sub in Practice

# After writing to the database, publish an invalidation event. # PUBLISH channel message PUBLISH invalidate:user:999 "deleted" # For range-based patterns (e.g., any product in category 7): PUBLISH invalidate:product:category:7 "changed"

The publisher doesn't need to know who is subscribed — it just fires the event. Redis delivers it to every active subscriber. If no subscribers are connected at the moment (e.g., during a deployment), the message is lost — pub/sub in Redis is fire-and-forget, not durable. For durability, use Redis Streams or Kafka instead.

# Subscribe to a pattern — PSUBSCRIBE uses glob patterns PSUBSCRIBE invalidate:* # On receiving a message: # 1. Parse the channel name to identify what changed # 2. Delete the corresponding key from in-process cache # 3. Optionally also delete from the shared Redis cluster # (Redis cluster is typically the source-of-truth cache; # in-process cache is a performance tier on top of it)

Pattern subscribe (PSUBSCRIBE) lets each app server subscribe to all invalidation events with one command, even as new entity types are added. Each server parses the channel name — "invalidate:user:999" — extracts the entity type and ID, and deletes the relevant local cache entry. This requires zero coordination between app servers: they all receive the same message independently.

Multi-Region: The Killer Complexity

Within a single data center, pub/sub fan-out is fast — Redis pub/sub round-trips in sub-millisecond time. Across regions, the picture changes. Each region has its own Redis cluster (because cross-region Redis replication is expensive in latency). Invalidation events must be replicated from the primary region to every secondary region, adding the inter-region network latency to your staleness lag.

The diagram illustrates why cross-region cache coherence is inherently eventual, not strong. An invalidation event fires in US-East the moment the database is written. EU-West receives it after the replication lag — typically tens to low hundreds of milliseconds under normal conditions. AP-Southeast receives it later still. Until the event arrives, any app server in those regions serves the old cached value. The replication lag is your cross-region staleness window, and you cannot eliminate it without abandoning local-region caching entirely (which defeats the purpose of multi-region deployment).

Consistent-Hashing Routing as an Alternative to Fan-Out

An elegant alternative to fan-out is to design your cache key routing so that the same key always maps to the same cache node — and always maps to the same application server as well. This is called consistent-hashing routing. If key user:profile:999 always goes to app server #3, and that app server has an in-process cache, then you only need to invalidate the cache on server #3. No fan-out needed. The trade-off: this design requires sticky routing (consistent hashing at the load balancer level), which complicates deployment rolling restarts and reduces flexibility. Most teams only adopt it for hot-key scenarios where fan-out overhead is measurably expensive.

Kafka + Debezium as a durable alternative to Redis pub/sub: Redis pub/sub is fast but not durable — messages sent while a subscriber is offline are lost. For invalidation events that must survive application restarts or deployments, use Kafka as the event bus. Debezium captures database changes and publishes them to a Kafka topic; each application server consumes that topic and deletes keys from its in-process cache. If a server is offline for two minutes during a deployment, it simply catches up by replaying the Kafka topic from its last committed offset when it comes back online.

Cache coherence across multiple application instances requires a pub/sub fan-out so every instance's in-process cache receives invalidation events. Redis pub/sub with PSUBSCRIBE makes this straightforward within a single region. Across regions, invalidation is inherently eventual — the cross-region replication lag creates a staleness window that can only be bounded, not eliminated, without sacrificing the latency benefits of regional caching.

Section 22

Tooling & Libraries — What Actually Exists

Every invalidation strategy we've discussed needs actual tools to implement it. The good news: most of the heavy lifting already exists in open-source libraries and managed services. The bad news: there are so many options that picking the right one is its own challenge. This section maps the tool ecosystem by layer — in-process cache, distributed cache, CDC pipeline, and CDN — and gives you a concise picture of what each tool does, what invalidation use case it serves, a quick syntax example, and importantly, when not to use it.

The ecosystem map groups tools into five layers. Each layer is the right tool for a specific part of the invalidation problem. In-process caches are fastest but need fan-out. Distributed caches are the shared authority. CDC pipelines are the event source. Managed DB caches bundle write-through as a service. CDNs and HTTP headers handle the edge layer. Now let's go through each with enough detail to actually use them.

Redis — DEL, UNLINK, Pub/Sub, and Keyspace Notifications

What it is: Redis is the most widely used distributed cache. For cache invalidation, Redis gives you three mechanisms: explicit key deletion, pub/sub for fan-out, and keyspace notifications.

Explicit delete: DEL user:profile:999 (synchronous, blocks until the key is deleted) or UNLINK user:profile:999 (asynchronous, queues the deletion without blocking the calling thread — prefer UNLINK for large values). Use this for explicit purge strategies.

Pub/sub fan-out: PUBLISH invalidate:user:999 "deleted" notifies all subscribers. Combine with PSUBSCRIBE invalidate:* on each app server for in-process cache invalidation. Not durable — messages sent to disconnected subscribers are lost.

Keyspace notifications (via notify-keyspace-events config): Redis can publish a message on its own internal pub/sub whenever a key expires, is set, or is deleted. Subscribe to __keyevent@0__:expired to react to TTL expiry events. Useful for triggering downstream refresh logic.

When NOT to use Redis pub/sub for invalidation: when you need durable delivery (application restarts or deployments will lose messages). Use Kafka or Redis Streams (XADD/XREAD) instead.

Debezium + Kafka — CDC-Driven Invalidation Pipeline

What it is: Debezium is an open-source CDC (Change Data Capture) framework that reads the binary log or WAL of PostgreSQL, MySQL, MongoDB, and others, and publishes row-level change events to a Kafka topic.

Invalidation use case: Any time a row changes in the database, Debezium publishes an event like {"op": "u", "after": {"id": 999, "name": "Alice"}, "source": {...}} to a Kafka topic. A consumer service subscribes to that topic and issues cache invalidation commands based on the event. This solves the dual-write problem entirely — the application only writes to the database; the cache invalidation happens as a downstream reaction to the change log, not as a second write from the application.

# Start Debezium connector for PostgreSQL (via Kafka Connect REST API) curl -X POST http://localhost:8083/connectors \ -H "Content-Type: application/json" \ -d '{ "name": "products-connector", "config": { "connector.class": "io.debezium.connector.postgresql.PostgresConnector", "database.hostname": "postgres", "database.port": "5432", "database.user": "debezium", "database.dbname": "shop", "table.include.list": "public.products", "topic.prefix": "dbz" } }' # Debezium will now stream every INSERT/UPDATE/DELETE on products # to Kafka topic: dbz.public.products

When NOT to use Debezium: when your database doesn't support CDC (e.g., some older MySQL setups without binlog replication enabled), or when operational complexity of running Kafka + Kafka Connect is too high for your team size. For small systems, an explicit purge on write is far simpler and nearly as fast.

Caffeine — In-Process Cache with refreshAfterWrite

What it is: Caffeine is the dominant in-process cache library for the JVM (Java/Kotlin/Scala). It uses a smart eviction algorithm called W-TinyLFU (which combines "how recently was this used" with "how often" to pick what to drop), and has first-class support for both TTL expiry and asynchronous cache refresh.

Invalidation use case: Caffeine's refreshAfterWrite policy implements the stale-while-revalidate pattern: when a key has been in the cache for longer than the refresh duration, the next access returns the stale value immediately (no latency) and triggers an asynchronous background refresh. This eliminates stampedes entirely — no waiting threads, no thundering herd.

LoadingCache<String, Product> productCache = Caffeine.newBuilder() // Maximum stale duration — after 60s, the background refresh is triggered .refreshAfterWrite(Duration.ofSeconds(60)) // Hard expiry — key is removed entirely after 5 minutes even if refresh fails .expireAfterWrite(Duration.ofMinutes(5)) .maximumSize(10_000) .build(key -> productRepository.findById(key)); // called on miss + on refresh // For explicit invalidation (on write events): productCache.invalidate("product:999"); // For bulk invalidation (e.g., category price change): productCache.invalidateAll(keysInCategory(categoryId));

When NOT to use Caffeine: when you have multiple JVM instances and need cross-instance coherence. Caffeine is per-JVM; without a pub/sub invalidation fan-out, each JVM's cache is an isolated island. Combine Caffeine with Redis pub/sub for multi-instance deployments.

AWS DAX — Write-Through for DynamoDB

What it is: Amazon DynamoDB Accelerator (DAX) is a fully managed, in-memory cache that sits in front of DynamoDB and speaks the DynamoDB API. It provides write-through caching out of the box — your application makes a standard DynamoDB PutItem or UpdateItem call, and DAX automatically updates its cache alongside the database write.

Invalidation use case: You don't have to write any invalidation code at all. DAX handles it. A PutItem on an item updates both DynamoDB and the DAX cache. A DeleteItem removes the item from both. The DAX cache is always consistent with DynamoDB for items that have been written through it. Reads that were cached by a prior read and not yet overwritten by a write will expire by TTL (default 5 minutes for both the item cache and the query/scan cache, both configurable at cluster-creation time).

import boto3 # Connect to DAX using the DAX client (same API as DynamoDB) dax_client = boto3.client('dax', endpoint_url='daxs://your-cluster.dax.amazonaws.com') # This write-through call updates BOTH DynamoDB and DAX cache atomically: dax_client.put_item( TableName='Products', Item={'productId': {'S': '999'}, 'price': {'N': '1299'}} ) # Next read for product 999 will return the updated price from DAX cache.

When NOT to use DAX: DAX is DynamoDB-only. It doesn't work with any other database. Also, DAX's query/scan cache does not automatically invalidate when the underlying items change — if you update an item, DAX's item cache updates, but a query that previously returned that item may still cache the old result until the query-cache TTL expires (5 minutes by default). This is a common source of subtle consistency bugs.

HTTP Cache-Control, ETag, and CDN Purge APIs

What it is: HTTP itself has a sophisticated cache invalidation model, defined in RFC 9111 (the 2022 HTTP Caching spec that obsoleted RFC 7234) for Cache-Control / Expires, and RFC 5861 for stale-while-revalidate / stale-if-error. Browsers, CDNs (Cloudflare, Fastly, CloudFront), and reverse proxies (nginx, Varnish) all implement these headers natively.

Invalidation via Cache-Control: set Cache-Control: max-age=300 to give the response a 5-minute TTL in any HTTP cache. Set Cache-Control: no-store to prevent caching entirely. Set Cache-Control: max-age=31536000, immutable for assets that never change (hashed filenames like app.v3f8a9.js).

Invalidation via ETag / If-None-Match: the server includes an ETag header (a hash or version of the response content). The client caches the ETag alongside the response. On the next request, it sends If-None-Match: "abc123". If the server's current ETag matches, it replies with 304 Not Modified (no body, very fast). If it doesn't match, the server sends the full new response. This is conditional GET — the client asks "has this changed?" before re-downloading.

stale-while-revalidate: Cache-Control: max-age=60, stale-while-revalidate=120 tells HTTP caches: "serve from cache for 60 seconds without any validation, then for the next 120 seconds serve the stale copy while asynchronously fetching a fresh version in the background." This is the HTTP-native version of stale-while-revalidate at the edge layer.

# Cloudflare cache purge by URL (requires zone ID + API token) curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \ -H "Authorization: Bearer {api_token}" \ -H "Content-Type: application/json" \ -d '{"files": ["https://example.com/products/999"]}' # Fastly surrogate key purge (purge all content tagged with a key) curl -X POST "https://api.fastly.com/service/{service_id}/purge/repo:123456" \ -H "Fastly-Key: {api_key}" # CloudFront invalidation aws cloudfront create-invalidation \ --distribution-id E1XXXXXXXXXXX \ --paths "/products/999" "/products/category/laptops"

When NOT to use CDN-level invalidation: CDN purge APIs are asynchronous — a purge request is queued and propagated to edge nodes over seconds to minutes, not milliseconds. For data that needs immediate consistency (inventory, pricing under legal pricing rules), CDN-level caching is not appropriate at all. CDNs are the right layer for content that changes infrequently and where seconds-to-minutes staleness is acceptable.

Memcached — flush_all, CAS, and cas-based Expiry

What it is: Memcached is the older and simpler distributed cache (compared to Redis). It supports three invalidation primitives: delete key (explicit delete), flush_all (wipe every key in the cluster — use with extreme caution), and gets / cas (compare-and-swap, for atomic updates).

flush_all: one of the most dangerous commands in a production cache. It takes an optional delay parameter — flush_all 300 schedules a full cache wipe in 5 minutes. Used for emergency resets when the cache state is known to be corrupt. Never call it in application code paths; it belongs in a break-glass runbook only.

When NOT to use Memcached for modern invalidation: Memcached has no pub/sub, no keyspace notifications, no persistence, and no built-in atomic data structures. For anything beyond basic TTL + explicit delete, Redis is a better choice today. Memcached's main advantage is simplicity and slightly higher throughput for pure key-value GET/SET workloads.

The cache invalidation tool ecosystem spans five layers: in-process (Caffeine, Ehcache), distributed cache (Redis, Memcached, ElastiCache), CDC pipeline (Debezium + Kafka), managed DB cache (AWS DAX), and CDN/HTTP (Cloudflare, Fastly, CloudFront, Cache-Control headers). Choosing the right tool means matching the layer to the invalidation pattern — Redis pub/sub for in-process fan-out, Debezium + Kafka for CDC-driven invalidation, AWS DAX for DynamoDB write-through, and HTTP Cache-Control for edge caching. Each tool has documented failure modes; knowing when NOT to use it is as important as knowing how to use it.

Section 23

Cheat Sheet & Glossary — The 30-Second Recap

Use this section as a quick reference when you need to recall a pattern or a term without re-reading the whole page. The cheat sheet gives you the one-sentence essence of each strategy. The glossary gives you precise definitions of every term used on this page.

Strategy Cheat Sheet

Glossary

Staleness: The condition where a cached copy of data no longer matches the current value in the source of truth (the database). A cached entry is stale the moment its source data changes. Staleness is measured as a duration: the time between when the source data changed and when the cached entry was invalidated.
Dual-Write Race: The consistency problem that arises when an application must write to two systems (e.g., a database and a cache) and there's no transaction spanning both. If the first write succeeds and the second fails, the two systems diverge silently. The transactional outbox pattern is the standard solution.
Transactional Outbox: A pattern where, instead of writing to the cache directly, the application writes a "pending invalidation event" to an outbox table in the same database transaction as the main write. A separate process reads the outbox table and fires the cache invalidation. Since both the main write and the outbox row are committed in one transaction, they're always consistent.
Write-Through: A cache write policy where every write updates the cache and the database simultaneously. The cache is always current. Contrast with write-around (skip the cache on write, let TTL refresh it) and write-back / write-behind (write to the cache first, flush to the database asynchronously).
Write-Around: A cache write policy where writes go directly to the database and bypass the cache. The cache entry for the written data is either left to expire via TTL or explicitly purged. Reduces write load on the cache at the cost of a cold cache miss for the next read.
Write-Back (Write-Behind): A cache write policy where writes land in the cache first and are asynchronously flushed to the database later. Reduces write latency (the application doesn't wait for the DB). Risk: if the cache node fails before the flush, the write is lost. Rarely used for caches that store user-facing data due to durability concerns.
CDC (Change Data Capture): A technique for capturing every row-level INSERT, UPDATE, and DELETE in a database by reading the database's internal change log (binary log in MySQL, WAL in PostgreSQL). CDC enables downstream systems (caches, search indexes, analytics pipelines) to react to data changes without polling the database and without requiring application code changes.
Generation Bump: A cache invalidation technique where a version counter scoped to a logical group (e.g., a merchant, a category, a tenant) is incremented atomically. All cache keys for that group incorporate the generation number. Incrementing the counter makes all old keys unreachable in a single O(1) operation, regardless of how many individual entries exist.
Surrogate Key: A metadata tag attached to a cached HTTP response (or CDN cache entry) that identifies the logical entities the response depends on. When an entity changes, all cached responses tagged with that entity's surrogate key are invalidated together with one API call. Implemented by Fastly, Cloudflare, and Varnish.
Stampede (Thundering Herd): When a popular cache key expires (or is deleted) and many concurrent requests simultaneously discover the cache miss, they all hit the database at once, creating a burst of DB load. The database can be overwhelmed. Solutions: probabilistic early expiration (XFetch), leases, or stale-while-revalidate.
Singleflight: A concurrency pattern where, when multiple goroutines (or threads) request the same missing cache key simultaneously, only one of them actually fetches from the database. The rest wait for and share the result of that single fetch. Eliminates per-key stampedes within a single process. Go's golang.org/x/sync/singleflight is the canonical implementation.
XFetch (Probabilistic Early Expiration): An algorithm that proactively refreshes a cache entry before it expires by computing a probability that increases as the TTL deadline approaches, weighted by the expected fetch time. The key insight: it's better to occasionally fetch fresh data slightly early (during low-traffic time) than to guarantee a stampede at exactly the TTL deadline.
stale-while-revalidate: A cache freshness model (defined in RFC 5861 for HTTP) where a stale cached response is served immediately to the current request while a background process fetches a fresh version. The current request sees zero added latency; the next request (after the refresh completes) sees fresh data. Supported natively by modern browsers and CDNs via Cache-Control: stale-while-revalidate=N.
Immutable Assets: Static files (JavaScript, CSS, images) that are given content-addressed filenames (e.g., app.v3f8a9c1.js) so their URL changes whenever their content changes. They can be cached with a year-long TTL (Cache-Control: max-age=31536000, immutable) because a changed file gets a new URL, making the old cached entry unreachable by definition. The "cache invalidation" here is handled entirely by URL versioning — the cleanest possible invalidation strategy.

The cheat sheet gives a one-sentence essence of each strategy — TTL for bounded eventual freshness, purge for explicit immediacy, write-through for strong consistency at write time, CDC for decoupled near-real-time invalidation, versioned keys for immutable past entries, surrogate keys for many-to-one logical entity invalidation, and stale-while-revalidate for stampede-free latency. The glossary provides precise definitions of every term used on this page, from staleness and dual-write race through CDC, generation bump, and XFetch.

Cache Invalidation — The Genuinely Hard Half of Caching

TL;DR — Cache Invalidation in Plain English

Why You Need This — When Stale Data Becomes a Bug

The Production Story: A Price Update and $85,000 in Refunds

The Math: At 99% Hit Ratio, 99% of Your Reads Are Stale

The Core Question Invalidation Answers

Mental Model — The Source-of-Truth Pyramid

The Pyramid: Distance from Source = Lag + Harder Invalidation

The Contract: Maximum Acceptable Lag

Core Concepts — The Vocabulary of Invalidation

The Twelve Terms You Must Know

The 4 Canonical Invalidation Strategies — Overview

Strategy at a Glance

How to Read the Sections Ahead

TTL Deep Dive — Eventual Consistency by Wall-Clock

The Mechanics: How TTL Works in Redis

Why TTL Works: The Bounded Staleness Contract

The Hidden Killer #1 — The TTL Stampede

The Jitter Fix

The Hidden Killer #2 — Tail Latency on Cache Miss

The Hidden Killer #3 — Fixed-Period Scheduling Clumping

When TTL-Only Is Sufficient (and When It's Not)

Explicit Purge — "Delete It When the Data Changes"

The Basic Pattern: Write DB → Delete Cache

The Dual-Write Race: Why "Simple" Purge Fails Under Concurrency

Fix 1: Delete Before Write (Cache-Aside with Pre-Delete)

Fix 2: Retry Queues for Failed Deletes

Fix 3: Transactional Outbox — Atomicity Without Distributed Transactions

Fix 4: Distributed Lock on Key Repopulation

Redis Commands: DEL, UNLINK, and SCAN

When Explicit Purge Is the Right Choice

Write-Through — Synchronous Co-Updates

The Hidden Cost: Doubled Write Latency

The Failure Mode Tree: Cache as a Hard Dependency

The Cache-Pollution Problem

When Write-Through Is the Right Tool

CDC-Driven Invalidation — Event-Sourced Truth

The Architecture: Tail the Log, Publish to a Stream

Debezium Connector Config — Real Syntax

The CDC Tooling Landscape

Debezium + Kafka Connect

Postgres Logical Replication (direct)

MySQL Binlog (direct or via Debezium)

AWS Database Migration Service (DMS)

MongoDB Change Streams

The Hard Parts: Ordering Across Shards

Exactly-Once Delivery and Schema Evolution

Versioned Keys & Generational Caching â€” Make Invalidation Free

How Version Tracking Works

Versioned Key vs. Purge Timeline

Global Generation Bumps â€” Invalidate Whole Categories at Once

Surrogate Keys & Cache Tags â€” Many-to-Many Invalidation

The Idea: Tag Every Cached Entry at Write Time

Implementation: How Tags Are Stored in Redis

Fastly Surrogate Keys: Native CDN Tag Invalidation

Lease & Time-Bounded Consistency â€” Hybrid Approaches

The Math: Hybrid Staleness Guarantee

Stale-While-Revalidate: Background Refresh

HTTP stale-while-revalidate in Practice

Application-Level Lease in Redis

The Worst-Case Staleness Tree

The Production Decision Matrix

High write frequency + High staleness cost

Low write frequency + High staleness cost

High write frequency + Low staleness cost

Low write frequency + Low staleness cost

The Non-Negotiable Rules

Consistency Models for Caches — From Strong to Eventual

Strong Consistency

Read-Your-Writes Consistency

Monotonic Reads

Eventual Consistency

Choosing a Model: The Decision Framework

The Thundering Herd & Stampede Mitigation

The Math Behind the Pain

Fix 1: Per-Key Locking with singleflight

Fix 2: Probabilistic Early Expiration (XFetch)

Fix 3: Jittered TTLs

Fix 4: Request Coalescing at the Proxy Layer

Edge Invalidation — CDN & Browser Caches