TL;DR β Cache Invalidation in Plain English
- Why cache invalidation is the hard half of caching β and exactly what goes wrong when you skip it or get it wrong (wrong prices, phantom inventory, double-charged payments)
- The 4 canonical invalidation strategies β TTL, explicit purge, write-through, and CDC/event-driven β what each does, its consistency guarantee, and its killer weakness
- Why TTL-only is dangerous for anything with financial or inventory consequences, and how to reason about the right maximum staleness for any business requirement
- How Change Data Capture (CDC) works as an event-driven invalidation pipeline and why it solves the dual-write problem that kills write-through in distributed systems
- The production patterns (versioned keys, surrogate cache tags, generational caching) that real engineering teams use to keep invalidation sane at scale
Phil Karlton, a principal engineer at Netscape in the mid-1990s, once quipped: "There are only two hard things in computer science: cache invalidation and naming things." The joke has survived three decades because it's accurate. Reading data from a cache is trivial. Knowing when the cached copy no longer reflects reality β and discarding it before a user acts on the wrong data β is where caching systems fall apart in production.
A cache stores a copy of data from a source of truth (usually a database). The moment that source-of-truth data changes, the cached copy becomes stale β it's a snapshot that no longer matches reality. If a user reads the stale copy and acts on it (buys at an old price, orders out-of-stock inventory, transfers money twice) you have a real production incident. Cache invalidation is the set of strategies that decide: "when should we throw away the cached copy so the next read fetches the fresh version?" Every strategy is a trade-off between consistency (how fresh the data is), performance (how often we hit the database), and operational complexity (how hard it is to build and maintain).
TTL (Time-To-Live) puts a timer on every cached entry; when the timer expires the entry is automatically deleted and the next read fetches fresh. Simple, low-ops, but stale for up to TTL seconds after every update. Explicit Purge means the application or an operator directly deletes a cached entry the moment the source data changes β immediate consistency, but requires the writer to know which cache keys to invalidate (a hard problem at scale). Write-Through updates both the cache and the database in the same write operation β no staleness at all, but it doubles write latency and creates the dual-write consistency trap in distributed systems. CDC (Change Data Capture) listens to the database's change stream (the binary log or WAL) and fires invalidation events whenever a row changes β decouples the cache from the writer, near-real-time, avoids dual-write, but requires a streaming pipeline (Debezium, Kafka Connect) and introduces operational complexity. Each strategy has a home; the art is picking the right one for each data type in your system.
The difficulty comes from three compounding factors. First, a single logical piece of data can live in many cache entries at once β the price of product #1234 might be in a product-detail cache entry, a search-results cache entry, a "you might also like" entry, and a cart subtotal entry. Changing the price in the database means invalidating all four, and you have to know all four exist. Second, writer and cache are often different services β the service that writes the price and the service that reads from the cache may not share code or even an RPC boundary. Third, invalidation in a distributed system (multi-region, multi-node) must propagate to every cache replica, and network partitions mean some replicas might not receive the invalidation. This page goes deep on the strategies, the failure modes, and the patterns that production teams actually use to tame all three dimensions of the problem.
Why You Need This β When Stale Data Becomes a Bug
Most engineers learn caching from the happy path: add a cache, hit ratio goes from 0% to 95%, database stops sweating, latency drops from 80 ms to 2 ms. That story is real and it's great. The part that doesn't make it into blog posts is what happens six months later, when a price change lands in the database but the cached version of the product page is still serving the old price β and 40,000 customers are actively browsing.
The Production Story: A Price Update and $85,000 in Refunds
Here's a scenario that plays out at e-commerce companies regularly. A product manager updates the price of a laptop from $1,499 to $1,299 β a flash sale. The database update lands in under a millisecond. But the product detail page is cached in Redis with a 24-hour TTL set six hours ago. That cached page still shows $1,499. Meanwhile, a promotional email goes out advertising the new $1,299 price. Customers click the email link, which loads the cached product page at $1,499. Some close the tab. Some, confused, try a different URL β which also hits the cache. A fraction just trust the email and call support. And some go to checkout β which hits a different cache layer that did receive the update β and pay $1,299, but their order confirmation email (generated from the cached product object) says $1,499. Now the support queue is full, legal is nervous, and someone is manually issuing refunds.
That's the staleness bug in its friendliest form β a price discrepancy. The same failure pattern produces consequences that range from embarrassing to catastrophic:
- Phantom inventory β an item is sold out in the database, but the cached product page still shows "In Stock." Customers add it to their cart, check out, and you have to cancel orders.
- Double payments β a payment state is cached as "pending." A retry API call reads the cached state and submits a second payment. The database has the correct idempotency key but the cached view doesn't.
- Discontinued items β a product is removed from the catalog. The product page is gone but a cached "you might also like" carousel still links to it. Customers click a dead link on your own site.
- Authorization bypass β a user's permission token is cached. The user is demoted or banned. The cached token still grants access for TTL seconds. In security-sensitive systems, this is an incident.
The Math: At 99% Hit Ratio, 99% of Your Reads Are Stale
Here's the number that hits engineers hard when they first see it. Suppose your product catalog has 100,000 items. You've set a 5-minute TTL on every product cache entry. At any given moment, each entry is between 0 and 5 minutes stale. Your cache hit ratio is 99% β meaning 99 out of every 100 product-page reads serve the cached copy. If you update one product's price, 99% of reads for that product see the old price for up to 5 minutes. At 10,000 product reads per second across your site, a single product update means roughly 9,900 reads per second are serving the wrong price until the TTL expires. With a 5-minute TTL that's up to 2.97 million stale reads before the cache clears. Whether that matters depends entirely on what the data is β for a blog post, 5 minutes of staleness is fine. For a financial instrument price, it can be illegal.
The diagram above makes the gap visible. The database reflects the correct price from the moment the write lands (T=0). The cache continues serving the old price for the entire staleness window β up to the full TTL duration. Every read that hits the cache during that window sees wrong data. The width of that window is the TTL. The fraction of reads that see wrong data is the cache hit ratio. Both numbers together determine how much business risk your TTL choice carries.
The Core Question Invalidation Answers
Every invalidation strategy is an answer to one question: "What is the maximum amount of time that can pass between a source-of-truth update and the moment the cache starts serving the new value β and what mechanism enforces that bound?" For a product description, the answer might be "24 hours, enforced by TTL." For a product price, it might be "5 seconds, enforced by explicit purge on update." For a bank balance, it might be "0 seconds, enforced by bypassing the cache entirely on reads or write-through on writes." The business requirement drives the acceptable staleness window; the staleness window drives the strategy choice.
Mental Model β The Source-of-Truth Pyramid
Before diving into specific invalidation strategies, it helps to have a single mental model that explains why the problem exists at all. Here it is: data in a modern web system lives in layers, arranged like a pyramid. The database at the bottom is the source of truth. Caches sit above it β faster, closer to the user, but derived. CDN edges sit at the top β fastest, most numerous, but furthest from the source and therefore the most likely to be stale.
The Pyramid: Distance from Source = Lag + Harder Invalidation
Think of the pyramid this way: the further a layer is from the database, the faster reads are from that layer (because it's closer to the user), and the harder invalidation becomes (because there are more copies to invalidate, farther away, with less reliable delivery). A database update takes one write. Invalidating a Redis cluster takes one delete per key across the cluster. Invalidating a CDN edge cache may require an API call to dozens of edge nodes in different regions, some of which may be temporarily unreachable. The further up the pyramid you go, the larger the staleness window tends to be β and the more that staleness costs when it's wrong.
The pyramid diagram shows the fundamental structure of the problem. At the bottom is the database β slow to read but always correct by definition. At every layer above it, you trade some correctness (you might read stale data) for speed (the read is faster). As you move up the pyramid, the lag between "source updated" and "this layer serves the new value" grows. And critically, the number of copies that need to be invalidated grows too β one database row becomes one Redis key becomes a key cached in 64 CDN edge nodes in 20 countries becomes a value cached in browser local storage on millions of client devices. The further up the pyramid the data is, the harder it is to invalidate across all copies quickly.
The Contract: Maximum Acceptable Lag
The way to use this mental model practically: for every type of data in your system, define a maximum acceptable lag β the longest time that can pass between a database update and the moment every reader starts seeing the new value. This is a business decision, not a technical one. "Prices must be current within 10 seconds" is a business decision. "Blog post content can be stale for up to an hour" is a business decision. Once you have the number, you choose the invalidation strategy that can enforce it at each pyramid layer. TTL enforces lag = TTL duration. Explicit purge enforces lag β 0 at the app-cache layer but not at the CDN layer unless you also issue a CDN purge. CDC enforces near-real-time lag at the app-cache layer. There is no single strategy that works at all pyramid layers simultaneously β that's why real systems mix strategies: short TTLs at the CDN edge combined with event-driven invalidation at the Redis layer, for example.
Core Concepts β The Vocabulary of Invalidation
Before we can talk about why write-through fails in a distributed system or why CDC solves the dual-write problem, we need a shared vocabulary. Twelve terms appear constantly in any serious discussion of cache invalidation. For each one, the plain English meaning comes first β then the precise technical term you'll see in papers, documentation, and production code reviews.
The Twelve Terms You Must Know
Stale data β When the cached copy of a value no longer matches what's in the database. Think of it like a printout of a live Google Sheet: the moment someone edits the Sheet, your printout is stale. In caching, the technical term is staleness. The word "stale" has a specific meaning here: it's not corrupted data. It was correct when written. It's just an old snapshot.
Time-To-Live (TTL) β Every cached entry can be given a lifetime in seconds. When that countdown hits zero, the entry is automatically deleted from the cache. The next read for that key misses, fetches from the database, and repopulates the cache with a fresh copy. TTL is the simplest invalidation mechanism β you don't have to detect changes or coordinate between services. The downside is that within the TTL window, reads can be stale, and you don't control exactly when the staleness window ends.
Staleness window β The period between a database update and the moment the cache stops serving the old value. With TTL-based invalidation, the worst-case staleness window equals the TTL. With explicit purge, the staleness window can be close to zero (you purge the moment you write). Understanding the staleness window for each data type is the first step in designing a correct invalidation strategy.
Consistency β In cache invalidation, "consistent" means every reader sees the same value and that value matches the source of truth. There are two levels: strong consistency (every read always sees the latest value β requires bypassing or immediately invalidating the cache on every write) and eventual consistency (all readers will see the latest value eventually, but there's a window when they might still see old data β that window is the staleness window). Most caching systems are eventually consistent by design.
Explicit purge β A direct command to delete a cache entry, issued by the application at the moment the source data changes. "The price changed β delete cache:product:1234." Purge can enforce a near-zero staleness window, but it requires the writer to know every cache key that holds a derived view of the data it's changing β which is the hard part.
Write-through β A pattern where every write updates both the cache and the database in the same operation. If the database write succeeds, the cache is also updated (or the old cache entry is deleted) before the write returns. This means the cache is always fresh β but it adds latency to every write, because you can't return until both the database write and the cache operation have succeeded.
Write-around β The opposite approach: writes go directly to the database, bypassing the cache entirely. The cache is only populated on reads (cache-aside pattern). This keeps the cache from ever holding stale data from writes β but it means a write followed immediately by a read will always miss the cache, because the write didn't populate it. Useful when write-heavy data is also read-heavy immediately after write (like a user's own profile after they edit it).
Write-back (write-behind) β Writes land in the cache first and are written to the database asynchronously. Very fast writes, but the cache now holds data the database hasn't persisted yet β a window where a crash loses data. Write-back trades durability for write speed and is used in situations where you can tolerate a small window of data loss (like game leaderboards or analytics counters).
Dual-write problem β When a system tries to update two separate stores (cache + database) as two separate operations, without a distributed transaction. If the database write succeeds but the cache update fails (or vice versa), the two stores are now inconsistent. The dual-write problem is the reason write-through is fragile in distributed systems β and the reason CDC (Change Data Capture) is the preferred approach for systems that need near-zero staleness without the dual-write risk.
Change Data Capture (CDC) β A technique for observing and reacting to every change in a database by reading the database's own internal change log (the binary log in MySQL, the WAL in PostgreSQL). Instead of the application explicitly updating the cache on write, a CDC agent (like Debezium) reads every database mutation as an event and can trigger cache invalidation automatically. Because the events come from the database itself, there's no dual-write: the database is the single writer, and cache invalidation is a derived reaction. CDC adds operational complexity (a streaming pipeline, typically Kafka + Debezium) but is the gold standard for high-consistency, low-staleness cache invalidation without coupling writers and cache layers.
Surrogate key / cache tag β Instead of invalidating one cache entry at a time, a surrogate key (or cache tag) is a label attached to a group of cache entries. Invalidating the surrogate key invalidates all entries that carry that label. For example, all product-page cache entries for category "laptops" might carry the tag category:laptops. When a laptop's price changes, you invalidate the tag β which clears all related cache entries without needing to enumerate them individually. Surrogate keys solve the enumeration problem β the "I know the price changed but I don't know every cache key that displays this price" problem that makes explicit purge hard to scale.
Generational (versioned) key β Instead of deleting a cache entry, you increment a version number embedded in the cache key. All new writes use the new key; all old cache entries (which have the old version in their key) are now "orphaned" β they still live in the cache but will never be read again because no code generates the old key anymore. Generational keys avoid the need for explicit deletes β useful in CDN contexts where purge APIs are rate-limited or expensive. The trade-off is that stale entries waste cache space until their TTL expires.
The vocabulary map above organizes the 12 concepts into three groups: the problems that make invalidation hard (stale data, staleness windows, the dual-write trap), the strategies that solve those problems (TTL, purge, write-through, CDC, surrogate keys), and the consistency models that describe the guarantees each strategy provides. When you read about cache invalidation in production postmortems or design docs, these are the terms you'll see β and now you have the plain-English grounding for each.
The 4 Canonical Invalidation Strategies β Overview
There are exactly four ways to keep a cache in sync with its source of truth, and every invalidation system you encounter in production is either one of these four strategies or a hybrid of two. This section gives you the overview β the big picture of each strategy before the deep dives in Sections 6β9. Read this section first so you have a map; the later sections fill in the territory.
Strategy at a Glance
The four strategies are ordered from least consistent to most consistent β and from least operationally complex to most. They're also ordered from "every team accidentally uses this" to "teams build this intentionally when they've been burned by the others."
The four-panel diagram above shows each strategy's core mechanism at a glance, and the comparison table below maps each to the dimensions that matter for choosing between them. Notice the trade-off pattern: strategies with lower staleness require more write-side coordination (write-through) or more operational infrastructure (CDC). TTL requires nothing from the writer but accepts staleness up to the full TTL duration. This is why real systems mix strategies β you don't pick one strategy for the whole system, you pick one per data type based on how much staleness that data type can tolerate and who owns the writes.
How to Read the Sections Ahead
Sections 6 through 9 go deep on each strategy in turn: the mechanics, the math, the failure modes, and the patterns that production teams use to make each one robust. Section 6 covers TTL β by far the most commonly used strategy, and one with more failure modes than most engineers realize. Section 7 covers explicit purge β deceptively simple until you need to invalidate composite cache keys. Section 8 covers write-through and its sibling write-back β why they feel right on paper and why they cause pain in distributed systems. Section 9 covers CDC β the most powerful strategy and the one that requires the most infrastructure. After Section 9, you'll have enough detail to evaluate any invalidation system you encounter in the wild and know exactly which strategy it uses, why, and what failure modes to watch for.
TTL Deep Dive β Eventual Consistency by Wall-Clock
TTL is the first caching strategy every engineer learns and the one almost every system ships with by default. It's a beautiful idea in its simplicity: every cached entry gets a countdown timer. When the timer hits zero, the cache automatically deletes the entry. The next request for that key misses the cache, fetches from the database, and repopulates the cache with a fresh copy and a new TTL. The writer never has to touch the cache. The cache operator never has to think about which keys to invalidate. The system justβ¦ works β up to the point that it doesn't.
The Mechanics: How TTL Works in Redis
In Redis, setting a TTL is a two-step operation you can make atomic. When you write a key, you also set an expiry. Redis stores the expiry as an absolute Unix timestamp in milliseconds and tracks it separately from the value itself. It deletes expired keys in two ways: lazily (when you next try to read the key, Redis checks if it's expired and deletes it before returning a miss) and actively (a background task scans a sample of keys with TTLs every 100 ms and deletes any that are expired). This means an expired key might live in memory for up to 100 ms past its expiry time before the background scan catches it β usually not significant, but worth knowing if you have very tight TTL requirements.
The first command is the preferred form β SET key value EX seconds sets the value and the TTL atomically in one operation. Why does atomicity matter here? Because if you SET then crash before EXPIRE, you've written a key with no TTL β it will live forever. In production, always use the atomic form.
For HTTP responses served through a CDN or browser cache, TTL is expressed in the Cache-Control header:
max-age=60 is HTTP's equivalent of Redis's EX 60 β the browser or CDN will serve the cached copy for up to 60 seconds from when it was first cached. After 60 seconds, it either fetches a fresh copy unconditionally or sends a conditional GET with an If-None-Match header (containing the ETag of the cached copy) β the server can return a 304 Not Modified if the data hasn't changed, saving bandwidth even on a cache miss.
Why TTL Works: The Bounded Staleness Contract
TTL is not "wrong" β it's a deliberate trade-off. You're saying: "I accept up to N seconds of staleness in exchange for zero write-side coordination complexity." For a huge fraction of data, this trade-off is correct. Think about:
- Blog post content β updates maybe once a day. A 1-hour TTL means at worst readers see content that's 1 hour old. Acceptable for an editorial site.
- User profile data β changes infrequently (display name, bio, avatar). A 5-minute TTL means profile pages are at most 5 minutes stale. Fine for social media.
- Feature flags / configuration β typically change rarely. A 30-second TTL means the worst case is that a feature flag change takes 30 seconds to propagate. Acceptable for most deploys.
- Search index metadata β new items appear in search results within TTL seconds of being indexed. A 60-second TTL is typical for e-commerce search.
The math for choosing a TTL is: TTL = maximum acceptable staleness in seconds. If the business says "product descriptions can be up to 30 minutes stale," set TTL = 1800. If it says "prices must be current within 5 seconds," you need a better strategy than TTL alone (or a 5-second TTL, which dramatically increases database load). The business requirement drives the number; the number drives the strategy choice.
The Hidden Killer #1 β The TTL Stampede
Here's the failure mode that bites every team at some point. Imagine you have 10,000 product pages, all cached with a 60-second TTL. You deploy your application at 14:00:00. All 10,000 cache entries are written in the first few seconds of the deploy. They all expire at approximately 14:01:00 β 60 seconds later. At 14:01:00, all 10,000 cache entries expire simultaneously. 10,000 concurrent requests for product pages all miss the cache at the same time. All 10,000 hit the database simultaneously. The database receives 10,000 concurrent queries when it was handling 500. It falls over. This is the thundering herd, also called a TTL stampede. It's especially vicious on cold deployments, after outages (when the cache is empty), and on applications that batch-populate the cache at startup.
The diagram shows the difference visually. Without jitter, all 10,000 expiries land at exactly T=60 seconds β one brutal spike of database queries. With jitter, the expiries are spread across a 20-second window (T=50 to T=70), so the load is distributed smoothly across time. The total number of cache misses is the same; the difference is whether they arrive all at once (catastrophic) or spread out (harmless).
The Jitter Fix
The solution to TTL stampedes is to add random jitter to every TTL. Instead of setting all entries to exactly 60 seconds, you set each entry to a random duration in a range around your target TTL. A common approach: TTL = base_ttl + random(0, base_ttl * 0.2). So for a 60-second TTL, each entry gets a lifetime between 60 and 72 seconds. The entries expire at different times, and the database load is smoothed out across the jitter window. Why 20% and not, say, 200%? Because you want enough spread to break synchronization but not so much that some entries effectively live twice as long as your staleness SLA allows β 10β20% is the sweet spot for most workloads.
Line-by-line: jitter = int(base_ttl * jitter_fraction) computes the maximum extra seconds we'll add β for a 60-second TTL with 20% jitter, this is 12 seconds. random.randint(0, jitter) picks a random offset between 0 and 12 seconds. ttl = base_ttl + random.randint(0, jitter) gives each entry a unique lifetime between 60 and 72 seconds. Every caller to cache_set_jittered gets a slightly different TTL, so expiries are naturally spread across time.
The Hidden Killer #2 β Tail Latency on Cache Miss
Even with jitter, TTL has a second failure mode that's subtler: when a popular cache entry expires and the next request has to refetch from the database and repopulate the cache, every subsequent request for that key arrives during the repopulation window. If repopulation takes 50 ms (a database roundtrip), and 200 requests per second target this key, all 200 req/s during that 50 ms window will miss the cache and hit the database. Only the first request repopulates the cache; the other 199 got nothing to wait for and all fired their own database queries.
This is sometimes called a cache stampede or dog-pile effect. The fix is a pattern called probabilistic early expiration (also called cache warming or stale-while-revalidate): instead of waiting for the TTL to hit zero before refetching, a small fraction of requests "probe" the cache status early and trigger a background refresh before the TTL expires. The cache stays warm, and no request ever actually sees a miss.
Cache-Control: stale-while-revalidate=30 directive tells CDNs and browsers to serve the stale cached copy for up to 30 seconds past its max-age expiry while asynchronously fetching a fresh copy in the background. The user gets an instant response (stale but fast); the fresh copy arrives and replaces it for the next request. This is the HTTP-level version of probabilistic early expiration and is supported by most modern CDNs and browsers.
The Hidden Killer #3 β Fixed-Period Scheduling Clumping
A more subtle stampede variant: your application runs a background job every 60 seconds that batch-refreshes a set of cache entries. The job writes all entries with TTL=60. The entries all expire at the same time the job next runs. If the job itself is slow or fails, all entries expire before the refresh completes, and real user traffic catches a mass miss. The fix is the same: add jitter to the TTL, and decouple the refresh job's schedule from the TTL duration so they don't align.
The lifecycle diagram shows the three phases of every TTL-managed cache entry: live (cache serves reads), expired and fetching (the miss window where the database gets hit), and repopulated (cache is live again with a fresh TTL). The miss window is typically just one database roundtrip β 5β50 ms. But during that window, every concurrent read for the same key also misses and fires its own database query. For high-traffic keys, this brief window can translate to hundreds of simultaneous database queries. Jitter prevents multiple entries from entering the miss window at the same time; probabilistic early expiry prevents popular entries from ever fully entering the miss window.
When TTL-Only Is Sufficient (and When It's Not)
TTL-only invalidation is sufficient when: (a) the data changes infrequently relative to the TTL, (b) the business can tolerate staleness up to the TTL duration, and (c) the data is not financially or legally sensitive. It is not sufficient when: (d) data changes are frequent and unpredictable (e.g., a live inventory count that can hit zero at any time), (e) the consequences of stale data are customer-facing financial errors (e.g., prices, discount codes, payment state), or (f) the data is a security credential (e.g., session tokens, API keys, permission sets) where the window between revocation and cache expiry represents an active security window. For these cases, explicit purge, write-through, or CDC must be layered on top of or in place of TTL.
Explicit Purge β "Delete It When the Data Changes"
The simplest idea in cache invalidation: when you change data in the database, immediately delete the corresponding cache entry. No timer. No lag. The next read that comes in finds nothing in the cache, goes to the database, fetches the fresh value, and repopulates the cache. You are in direct control β you decide exactly when the cached copy is discarded.
The appeal is obvious. You're not waiting for a TTL to tick down. The moment the product price changes, the cache is cleared, and the very next read gets the new price. Compare this to TTL: with a 5-minute TTL you might serve the old price to millions of reads before the entry expires. With explicit purge, the stale window collapses to milliseconds β just the time between the database write and the cache delete completing.
The Basic Pattern: Write DB β Delete Cache
The classic implementation looks simple in code. When your application updates a row, it fires two operations: the database write, then the cache delete.
The walkthrough: line 5 writes the new price to the database. Lines 8β11 delete every cache key that might contain that product's price. The next read for product:42 misses the cache, hits the database, and gets the current price. Clean, immediate, correct β as long as you remembered every key.
That last caveat is where the strategy begins to crack. In a real system, a single product's price might appear in a product-detail cache entry, a category-listing entry, a search-results entry, a recommendations carousel entry, and a cart subtotal entry. All of them need to be purged atomically when the price changes. And the set of keys grows every time a new feature is built β silently, without updating the purge logic.
The Dual-Write Race: Why "Simple" Purge Fails Under Concurrency
Even if you know every cache key, there is a subtler bug lurking in the write-then-delete ordering. Picture two concurrent requests: Writer A is updating the price. Reader B is doing a cache miss, reading from the database, and about to repopulate the cache. If these requests interleave in the wrong order, you end up with a permanently stale cache entry β one that no TTL will ever clear unless you set one as a fallback.
The diagram traces the race step by step. Reader B gets a cache miss and queries the database β but does so a hair before Writer A's transaction commits. So Reader B reads the old price. Then Writer A commits its write and fires its DELETE, leaving the cache empty. Then Reader B, blissfully unaware, writes the old price back into the cache. Now the cache holds the stale value indefinitely β the DELETE already fired, there's nothing left to trigger another one. If there's no TTL as a fallback, this stale entry lives forever. At high request rates this race is not theoretical; it happens regularly.
Fix 1: Delete Before Write (Cache-Aside with Pre-Delete)
One approach: reverse the order. Delete the cache entry before writing to the database. Now any Reader B that comes in during the write finds a cache miss, queries the database, and β because the write hasn't committed yet β reads the old value. But that's OK: the write will complete soon, and on the next request the old cached value's TTL (if any) or another purge will clean it up. You've reduced the race window significantly, though not eliminated it entirely.
Fix 2: Retry Queues for Failed Deletes
Sometimes the cache delete itself fails β the Redis node is temporarily unreachable, a network blip drops the command, or the application crashes between the DB commit and the cache delete. The result: the database has the new value but the cache still holds the old one. The fix is a retry queue. After every successful database write, publish an invalidation event to a durable queue. A separate worker consumes the queue and fires the cache delete. If the delete fails, the event stays in the queue and is retried with exponential backoff.
The key insight: by moving the delete to an async worker backed by a durable queue, you decouple the write path from the cache's availability. A Redis outage no longer breaks the write path β it just causes a brief delay in invalidation. The cache will be corrected as soon as the queue worker succeeds.
Fix 3: Transactional Outbox β Atomicity Without Distributed Transactions
The retry queue approach still has a gap: if the application crashes between the database commit and the queue publish, the invalidation event is never sent. The fix is to make the "queue this event" step part of the same database transaction as the data change itself β so they either both commit or both roll back. When you write the invalidation event to a regular table in your database, inside the same transaction as the actual data update, that's called the Transactional Outbox pattern. It closes the gap because there's no longer a moment when the data is committed but the invalidation event isn't β they share the same atomic commit boundary.
The outbox pattern is elegant because it borrows atomicity from the database itself. The outbox table is in the same database as the data. The same transaction that writes the new price also writes the invalidation event. If the transaction commits, both are durable. If it rolls back, neither exists. A separate poller (running on a schedule or a CDC stream of the outbox table itself) reads pending rows and fires the cache deletes. Because the poller runs independently, a temporary Redis outage just delays processing β it doesn't lose events.
Fix 4: Distributed Lock on Key Repopulation
For systems where even a millisecond of stale data after a delete is unacceptable, a distributed lock can prevent the concurrent-reader race. When a cache miss occurs, the reader acquires a lock on the key before querying the database. Other readers waiting for the same key get a cache-miss response and briefly spin or return a fallback value. Once the lock holder repopulates the cache and releases the lock, all subsequent readers hit the cache. This is complex to implement correctly and adds latency to cache misses, so it's only used when the dual-write race is genuinely causing incidents.
Redis Commands: DEL, UNLINK, and SCAN
Redis provides three commands you'll use in explicit-purge implementations. Each does something slightly different, and picking the right one matters at scale.
The critical lesson here: never use KEYS in production. The KEYS command scans the entire Redis keyspace in a single blocking operation. On a Redis instance with millions of keys, this can freeze the server for hundreds of milliseconds, effectively causing a brief outage for every application that uses that Redis instance. SCAN does the same job incrementally β it returns a cursor and a small batch of keys per call β so it spreads the scanning work across many small, non-blocking steps.
When Explicit Purge Is the Right Choice
Explicit purge is the right default for data that changes infrequently but is read many times per second, where any staleness is costly. User account settings, product catalog prices, permission configs, feature flags β these change rarely but are read constantly, and the consequences of serving a stale value are either a confusing user experience or a business error. For these types, the complexity of maintaining purge logic is worth the consistency guarantee.
It becomes the wrong choice when: the set of cache keys that reference any given piece of data is hard to enumerate (fan-out), when the write frequency is so high that the cache is being purged faster than it can be repopulated (cache thrashing), or when the writer service doesn't know which cache keys exist (service isolation). For those cases, CDC-driven invalidation (Section 9) or surrogate keys (Section 11) are better fits.
UNLINK is preferred over DEL for large keys; SCAN is mandatory over KEYS for pattern-based invalidation.Write-Through β Synchronous Co-Updates
Write-through is a different philosophy from explicit purge. Instead of deleting the cache on a write, you update both the cache and the database in the same write operation. Every write goes through the cache: the application writes the new value to the cache first (or alongside), then writes to the database. When the next read comes in, the cache already has the correct value β no repopulation, no cache miss, no staleness at all.
The intuition: you treat the cache as a synchronous write target, not just a read shortcut. Reads from the cache are always fresh because every write also hit the cache. The cache is never a "stale copy" β it's a simultaneous copy.
The diagram shows the write-through flow. Both the cache and the database receive the write. Reads always hit the cache and always get the current value β because the last write already updated it. There's no staleness window because there's no "write to DB, then separately update cache later" gap. They're updated together.
The Hidden Cost: Doubled Write Latency
Write-through sounds perfect on paper, but it comes with a mandatory tax: every write now takes at least as long as the slowest of the two writes β typically the database write, which might be 5β20 ms. But you're also now waiting for the cache write to complete, and in a naive synchronous implementation these two writes happen sequentially: write cache β write DB β return to caller. That's two round trips in the write path where before there was one.
You can parallelize them β issue both writes concurrently β but now you have a consistency problem: what if the cache write succeeds but the database write fails? The cache holds the "new" value that was never durably committed. The right answer is to write to the database first, then update the cache; if the cache write fails, the cache is just stale for a moment (a miss on next read will fix it). But that ordering collapses back into the same race conditions as explicit purge.
The Failure Mode Tree: Cache as a Hard Dependency
The failure mode tree illustrates the dilemma cleanly. When the cache is healthy, write-through works perfectly. When the cache goes down, you face an impossible choice: either you block all writes (making the cache a hard dependency that can take your entire write path offline), or you fall back to DB-only writes (accepting that the cache is now stale, which means write-through's consistency promise is broken and you're effectively back to needing purge logic or TTLs). Neither option is graceful. This is why write-through is not the universal answer it might initially appear to be.
The Cache-Pollution Problem
There's a second, quieter problem: write-through caches every write, whether or not the written item will ever be read from the cache. A batch import of 500,000 product records that nobody will ever browse individually fills the cache with cold data β evicting warm data that was actually being served to users. Write-through works best when the write rate is modest and when you can reasonably expect each written item to be read soon afterward. User profiles, shopping carts, session data β these are good candidates. Large bulk imports, log writes, event streams β bad candidates.
When Write-Through Is the Right Tool
Write-through earns its place when you have a write-mostly workload where every write will genuinely be read soon, and where the read latency guarantee is strict enough that you can't afford even a single cache miss. The canonical examples are user profile updates (the user will immediately see their own profile after updating it), shopping cart mutations (the user is about to view the cart), and feature flag updates (every server will read the new value on the next request cycle).
CDC-Driven Invalidation β Event-Sourced Truth
Both explicit purge and write-through require the writer to be responsible for cache management. The application that changes the data also has to know which cache keys to delete or update. This coupling is the root cause of most invalidation bugs in production β a new feature adds a cache layer, the writer isn't updated to purge it, and now you have stale data with no mechanism to fix it.
There's a clever way out: instead of making the writer responsible for cache management, you tap into the database's own internal change log and treat every row change as an event you can react to. The application just writes to the database like normal; a separate listener notices each change and fires the matching cache invalidations. When you do this β capturing every insert, update, and delete as a stream of events derived from the database's own journal β that's called Change Data Capture (CDC). The database already records every change it makes to disk (this is how it survives crashes β it's called the Write-Ahead Log in Postgres, or the binary log in MySQL). CDC tools tail this log and publish a stream of every row change to a message bus like Kafka. A cache-invalidation subscriber reads the stream and fires deletes for any key affected by each change.
The Architecture: Tail the Log, Publish to a Stream
The architecture diagram shows the key insight. The application only ever writes to the database β it has no cache logic at all. Debezium (a Kafka Connect plugin) tails the Postgres WAL and publishes every row change to a Kafka topic. A separate cache-invalidation service subscribes to that topic and fires cache.delete() for every affected key. If you add a new cache layer next month, you just add a new subscriber to the Kafka topic. The writer service never changes. This is the decoupling win that makes CDC the right choice for large systems where many services cache the same underlying data.
Debezium Connector Config β Real Syntax
Debezium is the most widely deployed CDC tool for relational databases. It runs as a Kafka Connect plugin and is configured with a JSON connector definition. Here's a real Postgres connector config:
Walking through the key fields: table.include.list tells Debezium which tables to monitor β only changes to products and inventory will be published (you don't want to publish changes from every table in the DB). slot.name is the Postgres replication slot β Postgres uses this to track how far Debezium has read the WAL, so no events are missed even if Debezium restarts. plugin.name: pgoutput selects the built-in Postgres feature that translates raw WAL entries into clean change events β Postgres has shipped this feature since version 10, and "pgoutput" is just its internal name. The ExtractNewRecordState transform flattens the event envelope so your consumer sees a simple before/after record rather than the raw Debezium envelope. The resulting Kafka topic name will be db.public.products.
The CDC Tooling Landscape
Debezium + Kafka Connect
The most battle-tested open-source CDC stack. Debezium connectors exist for Postgres (via logical replication / pgoutput), MySQL (via binlog), MongoDB (via change streams), SQL Server, Oracle, and others. Kafka Connect handles connector lifecycle, offset tracking, and fault tolerance. The Kafka topic gives you a replayable, ordered stream of every change β consumers can replay from the beginning if they fall behind or are added later.
Best for: self-managed infrastructure, mixed-database environments (Postgres + MySQL + MongoDB side by side), teams already running Kafka.
Postgres Logical Replication (direct)
Postgres has native logical replication built in since version 10. You can subscribe to a publication directly in application code using pg_logical or pgoutput, without Debezium. This removes Kafka from the stack, which simplifies operations but means you lose Kafka's replay and fan-out capabilities. A good choice when you have a single cache consumer and want to minimize infrastructure.
Best for: single-consumer CDC, teams that want to avoid Kafka complexity, Postgres-only setups.
MySQL Binlog (direct or via Debezium)
MySQL's binary log is the equivalent of Postgres's WAL for replication purposes. You can read it directly with libraries like python-mysql-replication or via Debezium's MySQL connector. binlog_format=ROW must be set (MySQL's default is STATEMENT, which logs SQL statements rather than row-level changes β CDC requires ROW format to see exact before/after values).
Best for: MySQL/MariaDB systems needing row-level change events.
AWS Database Migration Service (DMS)
AWS DMS can run in CDC mode, streaming changes from RDS, Aurora, or on-premise databases to Kinesis, SQS, S3, or another database. It abstracts away the replication slot management and connector configuration β at the cost of being AWS-specific and having less flexibility in event transformation. DMS CDC is a practical choice when you're already on AWS and want managed infrastructure.
Best for: AWS-native stacks, teams that want managed CDC without running Kafka.
MongoDB Change Streams
MongoDB has native change streams built in since version 3.6, backed by the oplog (operations log). collection.watch() returns an async cursor that yields every insert, update, delete, and replace in real time. Unlike SQL CDC, you can filter change streams server-side, reducing network traffic. Debezium also has a MongoDB connector if you want the Kafka fan-out.
Best for: MongoDB-native stacks needing real-time invalidation without external CDC tooling.
The Hard Parts: Ordering Across Shards
CDC sounds like it solves everything, but it introduces its own hard problems. The most challenging is ordering. In a single-shard database, the WAL is a totally ordered sequence β every change has a position, and changes are replayed in that order. But in a sharded database (Vitess, CockroachDB, Citus, or a manually sharded MySQL setup), each shard has its own WAL. Changes to different shards arrive at your Kafka consumer independently, with no global ordering guarantee.
The ordering diagram shows the problem. A cross-shard operation (moving a product from one shard to another, for example) produces a DELETE on shard 1 and an INSERT on shard 2, both at the same logical time. These events land in separate Kafka partitions, which have no ordering guarantee between them. The cache invalidator may process the INSERT (repopulating the key) before the DELETE (clearing it), leaving the cache with the old entry briefly. For most applications this sub-second window is acceptable. For applications that require strict consistency (financial ledgers, inventory counts), cross-shard CDC requires explicit transaction markers or a distributed transaction coordinator β both of which add significant complexity.
Exactly-Once Delivery and Schema Evolution
Two more operational challenges in CDC pipelines. First: when you'd like every change event to be processed exactly one time β not skipped, not duplicated β that's called exactly-once delivery. Kafka can give you this, but only with careful configuration: idempotent producers and transactional consumer groups. The default Kafka configuration is at-least-once (events may be replayed on consumer restart), so your invalidation handler should be safe to run twice β calling cache.delete(key) twice is harmless, but calling cache.set(key, value) twice can be dangerous if the second call uses a stale value.
Second: schema evolution. When you add a column to the products table, the Debezium event schema changes. Consumers that parse event fields by position (rather than name) will break the moment the new column lands. Schema Registry (part of the Confluent Platform) manages schema versions and validates compatibility before publishing, so downstream consumers fail loudly at deploy time rather than silently dropping events in production.
Versioned Keys & Generational Caching Γ’β¬β Make Invalidation Free
Here's a question: what if you never had to delete a cache entry at all? What if "invalidation" was just a side-effect of naming cache keys differently after each update Γ’β¬β and the old entries quietly aged out via TTL on their own?
That's the idea behind versioned keys. Instead of storing data at a stable key like product:42, you embed a version number in the key: product:42:v7. When the product changes, you bump the version. The new data lives at product:42:v8. The old entry at product:42:v7 is now orphaned Γ’β¬β nobody will ever request it again (because the version counter has moved on), so it ages out naturally when its TTL expires. You never had to fire a DELETE. The old key just becomes invisible.
How Version Tracking Works
The version number has to live somewhere Γ’β¬β typically in the database alongside the entity, or in a lightweight metadata key in Redis. Here's the pattern:
The walkthrough: on a write, the database atomically increments the version and returns the new value. The application writes the new data at the new versioned key. The old key is implicitly abandoned Γ’β¬β nobody holds a reference to it, so it will expire via TTL. On a read, the application first looks up the current version (from DB or a short-lived Redis key), constructs the versioned cache key, and reads from it. If the version was just bumped by a write, this will be a cache miss and the fresh value will be fetched from the database.
Versioned Key vs. Purge Timeline
The timeline makes the pattern clear. Before the update, all readers look up product:42:v6 and find it in cache. When the update fires at T=10s, a new product:42:v7 entry is written. From that moment, all readers look up v7. The v6 entry stays in memory until its TTL expires at T=3610s Γ’β¬β silent eviction, zero developer effort, zero race conditions. No DELETE was ever fired.
Global Generation Bumps Γ’β¬β Invalidate Whole Categories at Once
Per-entity versioning solves the "one item changed" case. But sometimes you need to invalidate an entire category of cache entries at once Γ’β¬β a global price override affects every product, a CSS bundle update affects every page, a configuration change affects every API response that includes config data. Per-entity versioning requires bumping the version on every item individually Γ’β¬β expensive if you have millions.
The solution: a global generation counter. Every cache key for a given category includes the current generation: cache:gen42:product:123. When the category changes, you increment the generation counter from 42 to 43 in a single atomic operation. Every key in that category instantly becomes unreachable Γ’β¬β they all have gen42 in them, but the generation pointer now says 43. Old entries expire via TTL. This is what Rails fragment caching uses with its cache_version mechanism.
The generation bump is elegant. A single INCR generation_key command in Redis atomically invalidates the entire category. Old generation keys are still in memory and will expire via TTL Γ’β¬β there's a temporary memory bump while both generations coexist, but this is usually acceptable. The trade-off: you must always read the generation counter before constructing a cache key, adding a Redis round-trip to every read. For read-heavy systems, the generation counter itself can be cached locally in application memory with a very short TTL (1Γ’β¬β5 seconds), making it near-zero cost.
Surrogate Keys & Cache Tags Γ’β¬β Many-to-Many Invalidation
All the strategies so far assume you can enumerate the cache keys that reference a given piece of data. But what happens when one piece of data Γ’β¬β a product, a user, a category Γ’β¬β appears in dozens or hundreds of different cached responses, and you can't predict in advance which ones those are?
Imagine a product with ID 42. It appears in: the product detail page cache, the category listing for "laptops," the search results for "ultrabook," the "you might also like" recommendations on several other product pages, the sitemap cache, and the homepage featured-products section. When the product's price changes, all of them need to be invalidated. Calling cache.delete() one by one requires knowing every key in advance Γ’β¬β and that knowledge goes stale every time a new feature adds a new cache layer.
The Idea: Tag Every Cached Entry at Write Time
Surrogate keys (called "cache tags" in many frameworks) invert the problem. Instead of trying to enumerate cache keys at invalidation time, you annotate each cached response at write time with the set of "things it depends on." When one of those things changes, you say "invalidate everything tagged with this thing" Γ’β¬β and the cache system handles the fan-out.
Here's the concrete form: when you cache the laptop category page, you tag it with category:laptops, product:42, product:99, and product:104 Γ’β¬β because those are the products on that page. When product 42's price changes, you call cache.purge_tag("product:42"). The cache automatically finds and deletes every entry tagged with product:42, however many there are.
The fan-out diagram shows the power of the pattern. A single purge_tag("product:42") call reaches across every cache entry that included product 42, regardless of the cache key names or how many there are. The sitemap entry Γ’β¬β not tagged with product:42 Γ’β¬β is untouched. Only entries that actually depend on this product are cleared. This is precise, automatic, and scales to thousands of entries.
Implementation: How Tags Are Stored in Redis
The cache system needs a data structure that maps from tag to the set of cache keys bearing that tag. In Redis, this is typically a Set per tag: tag:product:42 is a Redis Set whose members are the cache keys tagged with that product. On write, you add the cache key to each tag set. On purge, you read the tag set and delete every listed key, then clear the set itself.
The pipeline approach matters for performance: all the SADD operations to the tag sets, plus the main SETEX, happen in a single round-trip to Redis. On purge, reading the tag set and deleting all tagged keys also happens in a pipeline. For very large tag sets (thousands of keys), the tag set and all tagged keys should be on the same Redis shard (using hash tags in Redis Cluster: tag:{product:42}) to avoid cross-shard round-trips.
Fastly Surrogate Keys: Native CDN Tag Invalidation
This pattern is so valuable that CDN providers have built it natively. Fastly calls them Surrogate Keys, implemented via a response header:
Your origin server adds the Surrogate-Key header to the response. Fastly strips it before sending to the browser (users never see it), but stores the tag-to-object mapping internally at every edge node. When product 42 changes, you fire a purge via the Fastly API:
Fastly propagates the purge to all edge nodes globally in roughly 150 milliseconds. This is how large e-commerce platforms can cache HTML pages at the CDN edge with long TTLs (for high cache efficiency) while still invalidating instantly when a price changes. Varnish Γ’β¬β the open-source HTTP cache that Fastly is based on Γ’β¬β implements the same pattern with its ban mechanism: ban req.http.X-Tag ~ "product-42" removes all objects whose tag header matched that pattern.
The lifecycle diagram shows why this is powerful at the CDN layer. You can set a 24-hour TTL on HTML pages Γ’β¬β meaning extremely high cache efficiency and almost no origin traffic Γ’β¬β while still being able to purge specific pages within 150 milliseconds when their content changes. You get the performance of a long TTL with the consistency of an explicit purge. That combination is hard to achieve without surrogate keys, because wildcard CDN purges by URL pattern are slow and imprecise.
Surrogate-Key response header and a Purge API, propagating invalidations globally in ~150ms. Redis-based implementations store a Set of cache keys per tag and delete all members on purge. The pattern is essential for any system where one data change can affect many cached responses at once.Lease & Time-Bounded Consistency Γ’β¬β Hybrid Approaches
Every strategy so far is a pure form: pure TTL, pure purge, pure write-through, pure CDC, pure versioning, pure tags. Real production systems rarely use any strategy in isolation. The most robust caching systems combine two or more strategies in layers, so that a failure in one layer is caught by the other.
The most common hybrid is deceptively simple: TTL as fallback + explicit purge as fast path. The purge handles the normal case Γ’β¬β an update fires, the cache is immediately cleared. The TTL handles the abnormal case Γ’β¬β the purge failed silently, the application crashed, the network dropped the delete command. In the normal case, staleness is near-zero. In the worst case, staleness is bounded by the TTL. You get the consistency guarantee of explicit purge with the safety net of TTL.
The Math: Hybrid Staleness Guarantee
Let's quantify it. Suppose you set a 60-second TTL on every cache entry and also fire explicit purges on every write. In the normal case (purge succeeds): staleness Γ’β°Λ 0 seconds Γ’β¬β just the time the delete takes to propagate, typically under a millisecond within the same region. In the worst case (purge fails for any reason): staleness = at most 60 seconds. Compare this to pure TTL (always up to 60 seconds of staleness) or pure purge (potentially infinite staleness if a purge is silently dropped and no TTL is set). The hybrid gives you the best of both worlds. The 60-second fallback TTL costs you nothing in the normal path Γ’β¬β it's a safety valve that fires only when something goes wrong.
Stale-While-Revalidate: Background Refresh
The second hybrid pattern is stale-while-revalidate, standardized in RFC 5861. The idea: when a cached entry's TTL expires, instead of blocking the current request on a database fetch, serve the stale entry immediately and trigger a background refresh in parallel. The next request will get the fresh value.
This is particularly valuable for high-read endpoints where simultaneous cache misses are expensive. Without stale-while-revalidate, the moment a popular cache entry expires, every concurrent request races to the database Γ’β¬β this is the thundering herd problem. Stale-while-revalidate collapses this to one background query while all concurrent requests are served the stale value.
HTTP stale-while-revalidate in Practice
Walking through the directive: for the first 60 seconds, every request gets the cached response instantly Γ’β¬β no database hit. From 60 to 360 seconds, requests still get a fast response (the stale value), but the cache asynchronously fetches a fresh copy from the origin in the background. The first request that arrives after 60s triggers the background fetch; subsequent requests get the stale value while the fetch is in flight (usually milliseconds); once the fetch completes, the fresh value replaces the stale one. After 360 seconds, if the background fetch never succeeded, the entry is fully expired and the next request blocks on a synchronous re-fetch.
The timeline makes the value clear. Without stale-while-revalidate, every request that arrives just after T=60s blocks on the database Γ’β¬β if 500 requests arrive simultaneously at T=61s, all 500 hit the database in parallel. With stale-while-revalidate, all 500 get the stale value immediately and one background fetch runs. The database sees one query instead of 500. The 500 users see a response that's at most a few seconds old, well within the 300-second SWR window.
Application-Level Lease in Redis
NGINX, Varnish, and CDNs have stale-while-revalidate built in. In application-level caches (Redis), you implement the equivalent pattern yourself using a "must-revalidate-after" timestamp embedded in the cache value alongside the data.
The walkthrough: on a cache hit where the revalidate_after timestamp hasn't passed Γ’β¬β the hot path Γ’β¬β return the data immediately, sub-millisecond. If the timestamp has passed but the Redis key still exists (hard TTL hasn't fired): serve the stale data immediately and fire a background coroutine to refresh it. The current request doesn't wait. If the Redis key is gone (hard TTL expired or cold start): fetch synchronously. The hard TTL is the safety net Γ’β¬β even if the background refresh never ran, stale data will be refetched from the database within HARD_TTL_SECONDS.
The Worst-Case Staleness Tree
The staleness tree makes the lesson unavoidable: a TTL is not optional even when you're using explicit purge. If the purge succeeds Γ’β¬β which it will in the vast majority of cases Γ’β¬β staleness is near-zero and the TTL never fires. But if the purge fails for any reason, the TTL is the only mechanism that prevents stale data from persisting indefinitely. The TTL is cheap Γ’β¬β just a number stored alongside the cache entry Γ’β¬β and has zero cost in the normal path. Skipping it to save code complexity is a mistake that will eventually produce a production incident.
The Production Decision Matrix
After working through all six strategies, here's the practical decision framework. Think of your data along two axes: how often it changes (write frequency) and how costly staleness is (business impact). Most data falls into one of four quadrants:
High write frequency + High staleness cost
Examples: real-time inventory counts, live prices on a trading platform, financial balances.
Strategy: CDC-driven invalidation (the writer can't enumerate all the keys; decouple invalidation from writes entirely) + short TTL as fallback. For the most sensitive data, skip the cache on reads and hit the database directly Γ’β¬β some data is simply too volatile to cache safely.
Low write frequency + High staleness cost
Examples: product prices (change during sales, not constantly), permission configurations, feature flags.
Strategy: Explicit purge (you can enumerate the keys) + TTL as fallback (60Γ’β¬β300 seconds). Add surrogate keys if the data fans out to many cache entries.
High write frequency + Low staleness cost
Examples: article view counts, non-critical recommendation scores, aggregated analytics.
Strategy: TTL-only with stale-while-revalidate. The writes are too frequent for explicit purge to keep up, and the staleness window is acceptable. Use SWR to prevent thundering herds on expiry.
Low write frequency + Low staleness cost
Examples: blog post content, documentation pages, static configuration data.
Strategy: Long TTL at CDN edge (hours or days) + surrogate key purge for when updates happen. Or versioned keys for content that changes rarely but must be instantly consistent when it does change.
The Non-Negotiable Rules
Consistency Models for Caches β From Strong to Eventual
When you add a cache to your system, you are not just adding speed β you are also silently choosing a consistency model. A consistency model answers the question: "after a write happens, how quickly and how reliably can a reader see the new value?" Most engineers pick a caching strategy without consciously thinking about this, and then they're surprised when users see stale data in ways they didn't expect.
There are four consistency models that matter most for caches, ranging from the strictest guarantee to the most relaxed:
Strong Consistency
This means every read, from any node, always sees the most recently committed write. If you write a new price to the database, the very next read of that price β from any user, on any server β returns the new value. There is no window of staleness at all.
In a cached system, this is only achievable if every write synchronously updates both the database and every cache replica before returning success. In practice, this requires write-through caching with a lock that prevents reads from being served while the update propagates. The cost is high: write latency goes up, the system can't tolerate cache node failures gracefully, and multi-region deployments make synchronous propagation impractically slow. Strong consistency in a cache is rare in production β it only makes sense for financial ledger reads or authentication tokens where any staleness is unacceptable.
Read-Your-Writes Consistency
A softer guarantee: after you write something, you will always see your own write when you read. Other users might still see stale data, but the user who performed the write will never see an older version of the thing they just changed. This is the consistency model most users implicitly expect β "I just updated my profile picture, surely I see the new one." It's achievable without making writes globally synchronous. The two common techniques are sticky session routing and per-user cache bypass β for N seconds after a write, that user's reads skip the cache and go straight to the database. After the window passes, the cache has caught up and reads can resume normally.
Monotonic Reads
This means if you've seen version N of a value, you will never see a version older than N. You might not see version N+1 immediately, but you'll never go backward β no flip-flopping between old and new values. This happens in distributed caches when one replica has been updated but another hasn't, and your requests are load-balanced between them: request 1 hits the updated replica (sees new value), request 2 hits the stale replica (sees old value), and so on. Monotonic reads prevents that regression. It's achievable by either routing all reads for a given user to the same cache shard, or by attaching a read token (a version or timestamp) that the cache respects.
Eventual Consistency
The most common model in practice: the cache will converge to the current truth within a bounded time window, but there are no guarantees about any individual read during that window. TTL-based caching is the purest form β the cache is stale for up to TTL seconds, then automatically refreshes. Event-driven invalidation (CDC + Kafka) is also eventual β the invalidation message will arrive, but it might be delayed by milliseconds to seconds depending on lag.
Eventual consistency is fine for a huge range of use cases (blog post content, product descriptions, user preferences, analytics dashboards) and terrible for a different range (live inventory, payment states, access control lists). The key word is "bounded" β you should always be able to say "our cache is eventually consistent with a maximum staleness of X seconds under normal conditions." If you can't put a number on X, you don't really have a consistency model β you just have a cache.
The diagram above shows the four consistency levels from strictest at top (most guarantees, highest cost) to most relaxed at bottom (fewest guarantees, lowest cost). Most production systems actually use a mix β eventual consistency for product content, read-your-writes for user profiles, and a cache bypass for payment states. The art is matching each data type to the consistency level the business actually needs.
The key insight from this comparison: explicit purge without a fallback TTL can produce infinite staleness if the purge request fails and is never retried. In practice, every invalidation strategy except write-through should have a backstop TTL β not as the primary mechanism, but as insurance against bugs, network failures, or missed events in the invalidation pipeline.
Choosing a Model: The Decision Framework
The Thundering Herd & Stampede Mitigation
Here's a failure pattern that catches every team off guard the first time they experience it in production. You have a popular product page β maybe a celebrity's concert tickets β and its cache entry expires at 11:59:58 AM. At noon, a promotional email lands in 500,000 inboxes. Within two seconds, 40,000 people click the link simultaneously. All 40,000 requests arrive at your application layer. All 40,000 do a cache lookup. All 40,000 see a miss. All 40,000 fire a database query. Your database, which comfortably handles 2,000 queries per second, receives 40,000 queries in under a second. It falls over. This is the thundering herd problem, also called a cache stampede.
The Math Behind the Pain
The burst is predictable once you do the arithmetic. If you have 10,000 requests per second hitting a cache entry for a popular item, and that entry has a 60-second TTL, then every 60 seconds you have a potential stampede. The severity depends on how long a single database query takes.
If each DB query takes 50 ms, then in the 50 ms window between the first miss and the first response being cached, all requests that arrive will also miss. At 10,000 req/s and 50 ms latency, that's 500 simultaneous DB queries before the first one returns. For a single hot key.
And the worst part: the DB is now under load, so those queries now take 200 ms, meaning 2,000 simultaneous queries, so the load goes even higher β a positive feedback loop that melts the database. The stampede doesn't just spike load; it makes the spike worse by making each query slower.
Notice the positive feedback loop: the stampede increases DB load, which increases query latency, which means more requests pile up before the first response is cached, which increases load further. The only escape is external β the DB falls over and stops accepting connections, at which point the stampede also stops (because every request errors out immediately). That is not a recovery plan.
Fix 1: Per-Key Locking with singleflight
The most elegant fix is singleflight. The idea is beautifully simple: instead of letting every concurrent miss fire its own DB query, you let only the first one through, and every other request blocks and waits for that first request to finish. When the DB result returns and is stored in the cache, all waiting requests get the same result simultaneously. The database sees one query instead of 40,000.
Here's how singleflight looks in Go β where the standard library includes it in the golang.org/x/sync/singleflight package:
The third return value from group.Do is a shared boolean β it's true if the result was shared with other callers. You can use this for metrics: if 99% of calls have shared=true, your stampede protection is working hard. If it's always false, your cache hit ratio is high and you're never stampeding β good news either way.
Java's Caffeine library solves the same problem with refreshAfterWrite β entries are refreshed asynchronously before they expire, so there's never a cold miss for a hot key:
The key difference from singleflight: refreshAfterWrite is proactive β it refreshes before expiry rather than reacting to a miss. This is better for ultra-hot keys. The trade-off is that entries may be slightly stale for up to the refresh interval, which is acceptable for the use cases where you'd have stampede risk (high-traffic, non-financial data).
Fix 2: Probabilistic Early Expiration (XFetch)
What if you don't control the cache library and can't add singleflight? The XFetch algorithm (Vattani, Chierichetti & Lowenstein, "Optimal Probabilistic Cache Stampede Prevention," VLDB 2015) solves the stampede problem statistically: instead of refreshing exactly when TTL expires, each request has a small random probability of triggering an early refresh, with that probability increasing as the entry approaches expiry. The math ensures that on average, exactly one refresh happens per expiry cycle, spread across the population of requesters.
The formula for deciding whether to proactively refresh on a given read:
In the XFetch formula, Ξ² is a tuning parameter (larger = more aggressive early refresh), Ξ΄ is the time the last recomputation took (so expensive DB queries trigger earlier refreshes), and rand() is a random number. The result: as remaining TTL shrinks, the probability of triggering a refresh on any given read grows, distributing the refresh work across multiple clients and avoiding the hard expiry cliff.
Fix 3: Jittered TTLs
The simplest fix that costs almost nothing: add a small random value to every TTL. Instead of setting all product pages to exactly 300 seconds, set each one to 300 + rand(0, 30) seconds. This staggers expiry times across the key space so that at any given second, only a small fraction of entries expire. No single second sees a mass-expiry event. The downside is that it adds up to 10% more staleness on average β for most content, this is completely acceptable.
Fix 4: Request Coalescing at the Proxy Layer
Request coalescing (sometimes called request queuing or grace mode) is a feature built into reverse proxies like Varnish and many CDNs. When a cache miss occurs on a popular resource, instead of immediately passing all waiting requests to the origin, Varnish sends exactly one request upstream and serves the other clients from the stale copy (if available) or queues them. When the single upstream response returns, all queued clients receive it simultaneously. This is singleflight implemented at the HTTP proxy layer β with no code changes in your application. Varnish calls this req.is_bgfetch and it's configurable in VCL.
Edge Invalidation β CDN & Browser Caches
Everything we've discussed so far deals with caches close to your application β Redis, Memcached, in-process caches. But there are two more cache layers that are further from your control and genuinely harder to invalidate: CDN edge caches and browser caches. The further you are from the origin, the harder invalidation becomes β and browser caches are almost impossible to forcibly clear once content is in them.
CDN Invalidation: Fast But Not Instant
CDNs like Cloudflare, Fastly, and AWS CloudFront distribute your content to edge nodes around the world β there might be 200+ edge locations, each holding their own copy of your cached content. When you change a piece of content, you need that change to reach all 200+ nodes. That propagation takes time:
- Cloudflare: purge API typically propagates globally in about 150msβ2 seconds for most requests. The CDN documentation describes this as near-instant, and for practical purposes, it usually is.
- Fastly: offers instant purge (sub-second) via surrogate keys β their architecture is specifically designed for high-frequency programmatic purging. This is why media companies like The New York Times and GitHub use Fastly.
- AWS CloudFront: invalidations typically complete in under 2 minutes (well over 90% of edge locations within seconds), but can occasionally take longer for all global locations to clear. CloudFront also charges per invalidation path (first 1,000 invalidations per month are free, then $0.005 per path).
The practical implication: if you push a critical bug fix or a pricing error correction, you need to account for the CDN propagation window. If CloudFront takes a couple of minutes and your flash sale price went live incorrectly at noon, users in some AWS regions may see the wrong price for up to that window even after you fire the invalidation API call.
The takeaway: no CDN provides truly instant global invalidation. "Instant purge" means "faster than TTL expiry," not "zero propagation time." For content where any staleness after a purge is unacceptable, you need cache-busting URLs instead of relying on purge propagation.
The Cloudflare Purge API
Here's how to call the Cloudflare Cache Purge API from your application after a content update. The key point: purging by URL is simple but expensive at scale; purging by cache tag (which maps to Fastly's surrogate keys) is the production pattern for large catalogs.
To make tag-based purging work, you need to add the Cache-Tag header to every response your origin sends. Cloudflare reads this header and builds an internal index of which cache entries have which tags, so a single API call can purge thousands of cached entries at once β all product pages, all category pages, all search results that include a specific product.
Browser Caches: The Uninvalidatable Problem
Here is the hardest reality of edge caching: you cannot forcibly clear content from a user's browser cache. Once a user has downloaded app.js and their browser has cached it according to the Cache-Control headers you sent, you have no mechanism to reach into their browser and delete that file. HTTP does not have a "push invalidation" for browsers. The only way to make them download a new version is to change the URL.
This is why the production pattern for static assets (JavaScript, CSS, images) is content-hash URLs. Your build tool (webpack, Vite, esbuild) automatically renames every file to include a hash of its contents: app.a3f9c2.js. You then serve it with the longest possible TTL and a special marker that tells browsers "this file at this URL will never change, don't even bother checking back" β that marker is called immutable, and it lives in the Cache-Control header:
This diagram captures the full cache-busting pattern. The HTML page (index.html or the server-rendered entry point) must always be fresh β served with no-cache β because it is the manifest of which versioned asset URLs to load. The assets themselves can have year-long TTLs because the URL changes on every deploy. "Invalidation" for browser-cached assets means deploying a new URL, not sending a purge signal.
no-cache means "cache this, but always revalidate with the server before serving it." If you actually want the browser to never cache something, use Cache-Control: no-store. The naming is backwards from what you'd expect, and it has caused production incidents in teams that set no-cache on sensitive data thinking they were disabling caching.
Bug Studies β When Invalidation Goes Wrong
Theory is useful. Production incidents are better teachers. Each of the following bug studies is a realistic scenario β composite of patterns that recur at real engineering teams. The goal isn't to memorize the specific bugs, but to build the mental model that lets you recognize the class of failure before it happens in your own system.
What Went Wrong
The application used a dual-write pattern: on price update, write the new price to the database, then call cache.delete(productKey). The race condition: between the DB write and the cache delete, another application server was serving a request for that product. It got a cache miss (the old entry had just been evicted by an LRU pressure), fetched the new price from the DB, and populated the cache. Then the original write's cache.delete executed β but it deleted the freshly written cache entry containing the correct price. A second request populated the cache again from the DB correctly. So far, so good.
The real problem came one layer up: the product detail page was also cached as a rendered HTML blob (not just the price), at a different cache key. The code that invalidated the rendered page had a subtle bug: it only invalidated the key for the primary locale. The product existed in 7 locale-specific cache entries. Six of them continued serving stale HTML for the full TTL β 4 hours. The price field was correct in cache; the rendered page was wrong, and nobody noticed because the small "Buy" widget on the page was rendered separately with fresh data.
cache.delete(f"{id}:something:en-us") β a delete that hardcodes one variant of a multi-variant key β that is almost certainly a bug. Search your codebase for all cache key prefixes and verify that invalidation covers every variant.
What Went Wrong
This is the canonical thundering herd scenario described in Section 14, but with a real cascading failure pattern layered on top. The root causes: (1) a single TTL for all profiles regardless of their traffic level β a profile with 50 million followers is not the same as a profile with 50 followers; (2) no singleflight protection on the profile fetch path; (3) an unindexed count query that was cheap under normal load but became the chokepoint under stampede load.
db.Query() call without a singleflight guard, it is stampede-vulnerable for any key that gets popular. Add singleflight to every high-traffic cache miss path β it costs nothing when there's no stampede and saves your DB when there is.
user:1234:v{N}) to eliminate dual-write races. Six months later, Redis memory usage was 15Γ higher than expected and growing monotonically. The culprit: every time a user profile was updated, a new versioned key was written. The old versioned keys had no TTL and were never explicitly deleted. Redis was holding every version of every profile ever updated.
What Went Wrong
Versioned keys are an elegant invalidation pattern β bump the version number, and old keys become orphaned (unreachable by any reader). But "unreachable" does not mean "deleted." Without an explicit TTL on every versioned key, Redis holds them forever. The version number came from a counter in the database that only incremented. After six months with millions of updates, there were hundreds of millions of orphaned cache entries consuming memory. The system never crashed β Redis was eventually evicted under memory pressure using an LRU policy β but by then, active cache entries were also being evicted, destroying the cache hit ratio.
set(key, value) without a TTL argument. Any call that writes a key without an expiry is a memory leak waiting to happen. Some teams enforce this with a linter rule on their Redis wrapper class.
/flags.json) at the edge for performance. They deprecated a feature by setting a flag to false in the file and fired a CDN cache purge. The purge API returned HTTP 200. But for three days, some users saw the feature still enabled. Investigation revealed the purge API call had been successful for the API gateway β but the payload contained a typo in the file path (/flag.json instead of /flags.json). The typo meant the actual file at /flags.json was never purged. The CDN TTL was 7 days.
What Went Wrong
CDN purge APIs return success when the request was accepted, not when the purge actually propagated to all edge nodes. And a successful HTTP 200 from a purge API does not mean the path you purged was correctly matched β a typo produces a successful no-op purge. The team had no post-purge verification: they never fetched the resource from multiple edge locations to confirm the new version was being served. The 7-day TTL meant the error went undetected for 3 days before someone manually tested from a different region.
Common Misconceptions About Cache Invalidation
Cache invalidation misconceptions are unusually dangerous because they're not obviously wrong β each one sounds plausible at first reading, and some are even technically partially true in narrow circumstances. Understanding why each one is false (not just knowing it's false) gives you the mental model to catch them in your own code reviews and design discussions.
This is the most common cache misconception and it's half-right: TTL does eventually invalidate stale data. But TTL is not an invalidation strategy β it's a staleness bound. If your TTL is 5 minutes, you're saying "I accept that for up to 5 minutes after a data change, readers may see the old value." For an e-commerce product description or a blog post, that's likely fine. For a product price during a flash sale, an inventory count, or a permission check, it's a production incident waiting to happen.
The confusion comes from conflating "the stale data will eventually be gone" with "invalidation is handled." They're different guarantees. Real invalidation strategies β explicit purge, write-through, CDC β eliminate the staleness window entirely or bound it to milliseconds rather than minutes. TTL is what you fall back to when you can't or don't want to implement a real invalidation strategy, with full awareness of the maximum staleness you're accepting.
Dual-write is explicitly not atomic. Writing to the database and then deleting the cache key are two separate operations with no transaction boundary between them. Between the DB write and the cache delete, any number of other operations can happen: another server can read the old cached value, populate a new cached entry from the database, and have that fresh entry deleted by your arriving cache.delete call (the classic delete-then-repopulate race). Or the cache delete can fail entirely β a network timeout, a Redis restart, a transient error β while the DB write succeeded, leaving the cache permanently stale until the TTL expires.
The correct mental model: dual-write gives you best-effort cache invalidation with eventual consistency in the normal case. For true atomicity, you need either write-through with a distributed transaction (impractical in most systems) or CDC, which reads invalidation events directly from the database's durable change log rather than from application code.
This is a naming disaster baked into the HTTP specification. Cache-Control: no-cache does NOT mean "do not cache." It means "cache this response, but you must revalidate it with the server (send a conditional GET) before using it on every subsequent request." The browser stores the response locally but always asks the server "is this still current?" before serving it. If the server responds with 304 Not Modified, the browser uses its cached copy. Only if the server sends a new response does the browser discard the old one.
If you actually want the browser to never cache a response and never store it locally, the correct directive is Cache-Control: no-store. The difference matters enormously for sensitive content: a page served with no-cache is stored on disk in the browser cache and viewable in developer tools even after the user navigates away. A page served with no-store is never written to disk.
CDN providers use the word "instant" to mean "much faster than TTL expiry" β not "zero latency globally." Even Cloudflare's fastest purge propagation involves the purge signal traveling to hundreds of edge locations around the world. In practice, Cloudflare propagates purges globally in milliseconds to a few seconds for most requests. Fastly is similarly fast. But "a few seconds" is not the same as "instant."
The gap matters in two scenarios: (1) time-sensitive content (a security advisory, a deprecated API response) where even a 2-second window of stale content at a heavily-trafficked edge node could be meaningful; (2) verification β if you fire a purge and immediately check whether the new content is live from a remote region, you may still see the old content for a few seconds. The correct pattern is to fire the purge and then poll until edge nodes confirm the new version is live, with a timeout and alerting for cases where propagation takes unexpectedly long.
Write-through gives strong consistency for a single-node, single-cache deployment: every write updates both the database and the cache atomically, so any read from that single cache node sees the most recent write. But "strong consistency" breaks down the moment you have multiple cache replicas. If you have Redis Cluster with 3 primary nodes, a write-through operation updates the database and the primary node responsible for that key β but the other two replica nodes may not be updated synchronously. A read that hits a different replica sees a stale value.
Additionally, write-through doesn't help if the two writes (DB + cache) are not truly atomic. If the DB write succeeds but the cache write fails (Redis timeout, network partition), you have strong consistency in neither direction: the DB has the new value but the cache has the old one. To handle this, you need either a retry with idempotency or a rollback of the DB write β both of which significantly complicate the write path.
Summary: write-through + a single cache node = strong consistency under normal conditions. Write-through + distributed cache = complex partial failure modes that look like strong consistency until they don't.
CDC reads from the database's write-ahead log (WAL) or binary log, which does record writes in order β for a single database node. The ordering breaks down when your database is sharded or replicated. If product ID 1234 and product ID 5678 live on different database shards, their change events enter different WAL streams, flow through different Kafka partitions, and may be processed by different consumer instances. The consumer for product 1234 might process its event before the consumer for product 5678 processes its event, or vice versa β there is no cross-shard ordering guarantee.
Within a single shard, ordering is preserved because all writes for that shard flow through one WAL. Across shards, you can only guarantee that events for the same partition key (same shard) are ordered relative to each other. If your cache invalidation logic assumes a global ordering of events, it will produce incorrect results under cross-shard workloads.
This is the opposite problem from the memory leak described in Bug Study 3: some engineers avoid versioned keys entirely because they worry about orphaned versions filling up memory. The worry is valid β but the solution is TTLs, not abandoning the pattern.
Versioned keys with TTLs are actually quite memory-efficient in practice. The number of live versions of any key at any moment is at most 2: the current version (being actively read) and the previous version (recently orphaned, expiring within one TTL window). If you write version 42 at time T, the previous version 41 expires at most one TTL window later. After that, only version 42 occupies memory. The memory overhead is at most 2Γ per key, and only during the transition window. Compare this to surrogate tags (which require the CDN or cache to maintain an index of all keys per tag) or full record locking (which holds locks for the duration of every write). Versioned keys with TTLs are among the leanest invalidation patterns when implemented correctly.
Practice Exercises β Build Your Intuition
Reading about cache invalidation builds vocabulary. Actually designing invalidation systems β making trade-off decisions under constraints β builds the intuition you need to do it correctly under time pressure in an interview or in production. Work through each exercise before reading the answer.
An e-commerce platform wants to launch a flash sale at exactly 12:00:00 PM. Product prices will be updated in the database 60 seconds before launch (at 11:59:00). The product detail pages are cached in Redis with a 10-minute TTL and also served via a CDN with a 5-minute TTL. The business requirement: no customer should see the old price after 12:00:00 PM. Design an invalidation strategy that meets this requirement.
- At 11:59:00: Write new prices to the database with a
goes_live_at = 12:00:00timestamp. Do NOT yet update the cache. - At 11:59:30: Fire CDN cache purge for all product URLs involved in the sale. This gives the CDN 30 seconds to propagate before launch. For CloudFront (up to 60s), this would be tight β consider Fastly for time-sensitive purges.
- At 11:59:55: Fire Redis explicit deletes for all affected product keys. The 5-second window before launch ensures the cache is empty at 12:00:00, and the next read will fetch the new price from the database.
- Application logic: The price query should check
goes_live_at <= NOW(), so any cache-bypassing read after 12:00:00 returns the new price even if somehow a stale entry survives.
You have a cache entry with a 60-second TTL. You also have an explicit purge mechanism, but it only delivers purge messages with 99% reliability (1% of purge messages are lost due to network failures). Calculate the expected maximum staleness window under normal operation, and under the failure case.
E[staleness] = 0.99 Γ 2s + 0.01 Γ 60s = 1.98s + 0.6s = ~2.6 seconds expected maximum staleness. What this means in practice: At 100 updates per second, you'll have roughly 1 update per second where the purge is lost, with a 60-second staleness window. At any given moment, roughly 60 Γ 1 = 60 cache entries are serving stale data from lost purges, each for up to 60 seconds. The key insight: A 1% purge failure rate does not mean "1% staleness." It means "1% of updates have TTL-length staleness instead of purge-speed staleness." For a 60-second TTL, this is 30Γ worse than the normal case per affected update. Always pair best-effort purge with a backstop TTL short enough that purge failures are acceptable. If 60 seconds of staleness is unacceptable, reduce the TTL rather than improving purge reliability.
A teammate shows you this pseudocode for a cart update. Identify the dual-write race condition and explain under what timing it produces a stale cache.
db.execute and the cache.delete. What happens if a read request fires in that tiny window?- Thread A executes
db.executeβ cart is updated in the database (qty = 5). - Thread B fires a cache read for the same user's cart. Gets a cache miss (either the entry expired or was already deleted).
- Thread B reads from the database β correctly gets qty = 5. Writes
{"item_id": X, "qty": 5}to the cache. - Thread A executes
cache.deleteβ deletes the fresh entry Thread B just wrote. - Thread C fires a cache read. Gets a cache miss. Reads from the database β correctly gets qty = 5. Writes to cache. Now the cache is correct again.
db.execute, Thread B reads and caches (getting qty = 3, the old value, from a replica with replication lag), and then Thread A's delete fires β the cache now has qty = 3. The next read after Thread A's delete re-fetches from the DB and gets 5. Net result: a brief window of stale data, not permanent corruption.
The more dangerous version: if you do cache-write instead of cache-delete: write new cart to DB, then write new cart to cache β now Thread B can overwrite Thread A's cache entry with a stale read-through, and there's no subsequent delete to clean it up. This is why delete (not update) on the invalidation side is the correct pattern for best-effort consistency.
Design a surrogate cache tag scheme for an e-commerce site with the following page types: product detail pages, category listing pages, search results pages, and a homepage "featured products" carousel. When a product's price changes, which tags need to be purged? When a product is added to a new category, which tags? When a product is discontinued (removed from catalog), which tags?
- Product detail page for product P: tags =
["product-{P.id}"] - Category listing page for category C: tags =
["category-{C.id}"](contains many products but you purge the whole page on any product change in that category) - Search results page for query Q: tags = per-product tags for all products in results (
["product-{p1.id}", "product-{p2.id}", ...]) β this allows purging all search result pages that contain product P by issuing a singleproduct-P.idtag purge. Alternatively, if the search index is rebuilt on product changes, use["search-index"]as a coarser tag. - Homepage featured carousel: tags =
["featured"]+ tags for each featured product. Purgingfeaturedclears the carousel when the featured set changes.
- Price change on product P: purge tag
product-{P.id}β clears product detail page + all search result pages containing P + homepage carousel if P is featured. - Product P added to new category C: purge tags
product-{P.id}+category-{C.id}β clears old pages for P and the category listing that now includes P. - Product P discontinued: purge tag
product-{P.id}β clears all pages mentioning P. Additionally purge all category tags for categories that contained P (or use a broader tag likecatalogif discontinuations are rare).
Design a function GetOrFetch(key, fetchFn, ttl) that is safe against cache stampedes for hot keys, handles the case where fetchFn returns an error (should not cache errors), and uses jittered TTLs to prevent synchronized expiry across a batch of related keys. Write the implementation in Go or Python.
- Double-check inside singleflight: after acquiring the group lock, re-check the cache. Between the outer miss and the singleflight execution, another goroutine may have already populated the cache. Without this, you'd fire an extra DB query for every group of concurrent misses, not just one.
- Never cache errors: if fetchFn returns an error, return it to all waiting goroutines but don't store it. Caching an error means all readers get the error for TTL seconds even after the underlying problem is fixed.
- Jitter is relative to base TTL: jittering by Β±10% of the base TTL keeps the staleness properties predictable while spreading expiry events enough to eliminate synchronized expiry across a batch of keys.
- Cache write failure is non-fatal: a Redis write failure during the slow path should degrade gracefully (log + continue), not fail the request. The origin response is still valid data for the caller.
Tuning Invalidation in Production β A Decision Playbook
Theory is nice. Production is different. When you're staring at a new feature ticket that involves cached data, you don't have time to re-derive the whole cache invalidation decision tree. What you need is a repeatable five-step process you can run in your head β or better yet, document in your team's runbook β that takes you from "we have a caching need" to "this is the invalidation strategy, these are the metrics to watch, and this is our alert threshold." That's what this section gives you.
The five steps aren't academic categories. They're the actual sequence an experienced engineer runs: first understand how stale is too stale, then pick a strategy that satisfies that tolerance, then instrument the strategy so you can see when it's failing, then set alerts, then plan for the moment the business changes and your current strategy no longer fits. Let's walk through each step in depth.
The five steps above form a loop, not a one-time decision. The diagram shows each step feeding naturally into the next β and Step 5 explicitly loops back to Step 1, because business requirements are not stable. Let's go deep on each.
Step 1 β Measure the Business's Staleness Tolerance
Before you choose an invalidation strategy, you need a number: how stale is too stale? This is a business question, not a technical one, and engineers often skip it because it feels like someone else's job. It's not. If you don't get this number out of a product manager, you'll either over-engineer (spending engineering time on millisecond-freshness for data that nobody checks more than once a day) or under-engineer (using a 1-hour TTL on payment state).
Ask: "If this data is N minutes old when a user reads it, does anything bad happen?" Iterate N from very large to very small until you find the threshold where "yes, something bad happens." For a blog post, N can be 60 minutes and nothing bad happens. For product pricing during a flash sale, N is 30 seconds. For available seat count on a flight, N is 5 seconds. For a bank balance, N is effectively zero β you cannot serve a stale balance to a customer making a payment decision.
Document this number as your staleness SLA for this data type. It is the ceiling on your TTL and the maximum allowed staleness lag for any event-driven strategy. Everything downstream of Step 1 uses this number.
Step 2 β Choose a Strategy Using the Workload Matrix
With your staleness SLA in hand, you can now use the workload-to-strategy matrix to pick an approach. The matrix has two dimensions: tolerance (how much staleness the business accepts) and workload shape (is this data read far more than it's written, or written heavily?).
The matrix gives you the first-order answer. Top-left (low writes, high tolerance) is the easy case β set a TTL equal to or less than your staleness SLA and move on. Bottom-right (high writes, low tolerance) is the hard case β you need CDC because write-through has too many moving parts at high write volume. The diagonals are the nuanced cases where you layer strategies: TTL as the backstop, purge as the fast path, CDC as the consistent guarantee.
Step 3 β Instrument: The Four Metrics That Matter
A cache invalidation strategy you can't measure is a strategy you can't trust. There are exactly four metrics you need to emit on every cache invalidation path:
- Hit ratio β the percentage of reads served from cache vs. falling through to the database. A sudden drop in hit ratio often means your invalidation is too aggressive (keys are being deleted before they'd naturally expire). A suspiciously high hit ratio on a volatile data type might mean your purge pipeline is broken and nothing is being invalidated at all.
- Staleness lag β the time delta between when the database row changes and when the corresponding cache key is deleted. This is your most direct measure of whether your invalidation is meeting its SLA. For a TTL strategy, staleness lag can be up to TTL seconds. For CDC-driven invalidation, it should be the end-to-end latency of your Kafka/Debezium pipeline β typically milliseconds to low seconds.
- Purge success rate β for explicit purge and CDC-driven strategies, what percentage of intended invalidation operations actually succeed? A Redis `DEL` or `UNLINK` can fail if the Redis node is partitioned, if your application server crashes between DB write and cache delete, or if a deployment is in progress. Track this as a percentage and alert if it drops below 99.9%.
- Stampede rate β how many concurrent origin fetches are happening on a single cache key at the same moment? This is the thundering herd signal. If you're seeing five or more concurrent database reads for the same cache key, your expiration is causing stampede events and you need to implement a lease or stale-while-revalidate approach.
Step 4 β Monitor and Alert
Metrics without alerts are decoration. For each of the four metrics above, set alert thresholds tied to your staleness SLA:
- Staleness lag exceeds your SLA threshold β page the on-call engineer immediately. This is a consistency breach in progress.
- Hit ratio drops more than five percentage points in five minutes β investigate. Either traffic pattern changed or invalidation is broken.
- Purge success rate falls below 99.9% β alert. Silent purge failures mean your cache is silently drifting away from truth.
- Stampede rate (concurrent DB reads per key) exceeds five β warning. Not yet a crisis but you need to add lease or stale-while-revalidate before it becomes one.
Step 5 β Iterate When Business Needs Change
Your first invalidation design is never your last. The most common triggers for re-running this playbook from Step 1: (a) a regulatory or compliance requirement tightens your freshness SLA from minutes to seconds; (b) traffic grows by an order of magnitude and write-through latency is now noticeable to users; (c) you expand to multiple geographic regions and your staleness lag number now includes cross-region propagation time; (d) a new product feature changes the write pattern of a data type from infrequent to high-frequency. When any of these happen, go back to Step 1 with fresh eyes β don't just patch the existing strategy.
Real-World Architectures β How Big Companies Actually Invalidate
Reading about strategies in the abstract is useful. Seeing how real engineering teams implemented those strategies β with all the real-world constraints of existing databases, traffic patterns, and team size β is where the deep understanding happens. Each of the systems below made a specific, documented trade-off. Understanding why they made that choice tells you more about invalidation than any number of theoretical comparisons.
A note on numbers: specific scale figures for internal systems change constantly and are often not publicly precise. We'll name the architectural pattern and the trade-off clearly; we'll stay soft on numbers that aren't from published engineering sources.
Facebook's TAO (The Associations and Objects) is the caching layer that sits in front of Facebook's social graph database. It was described in a USENIX ATC 2013 paper (Bronson et al., "TAO: Facebook's Distributed Data Store for the Social Graph") and is one of the most cited real-world cache designs. TAO stores objects (users, posts, photos) and associations (friendships, likes, comments) in an in-process cache tier.
TAO uses write-through within a single data center: a write to an object or association updates the database and the TAO cache in the same logical operation, so readers in the same region see strongly consistent data immediately after the write. The hard part is multi-region: Facebook operates data centers on multiple continents, and each has its own TAO cluster. When a write happens in one region, the other regions' TAO caches hold stale copies.
The solution TAO uses is asynchronous invalidation via replication. Writes flow through the primary region, which asynchronously replicates to secondary regions. Each secondary TAO cluster receives the replication event and fires cache invalidation messages to its own cache nodes. The trade-off is explicit: secondary-region reads can be transiently stale by the replication lag, which is typically in the milliseconds-to-low-seconds range under normal network conditions. For a social graph (timeline, friend list, notifications) this level of eventual consistency is acceptable β you might not see a like for a few seconds, but that's fine.
The key lesson from TAO: even at enormous scale, the answer isn't "strong consistency everywhere" β it's "strong consistency where it matters most (within a region) and bounded eventual consistency where the trade-off is acceptable (cross-region)."
Shopify runs one of the largest multi-tenant e-commerce platforms in the world, built on Ruby on Rails. Rails has a built-in fragment caching system, and Shopify's use of it is a textbook example of versioned key invalidation.
In Rails fragment caching, every cached HTML fragment or JSON object has a cache key that incorporates the model's updated_at timestamp and a version counter. When a product is updated in the database, Rails automatically computes a new cache key for that product's fragment because the updated_at timestamp changed. The old key is simply never read again β it becomes orphaned and will eventually be evicted by LRU. There's no explicit purge call; the key itself changes, making the old cached version unreachable. This is the generation bump pattern: increment the version, and all old cache entries instantly become dead entries.
For mass invalidation (for example, a merchant deploys a new theme that changes how all their product pages render), Shopify uses a global generation counter scoped to a merchant's store. Bumping this counter invalidates every cached fragment for that merchant in a single atomic operation β no matter how many individual product, collection, or page fragments exist. Without this pattern, invalidating all fragments for a large merchant's store on a theme update would require enumerating and deleting thousands of individual keys, which is both slow and prone to races.
The lesson: versioned keys turn "how do I delete everything?" into "how do I make the old keys unreachable?" β an elegant shift that eliminates whole classes of purge race conditions.
Fastly is a CDN that supports a feature called surrogate keys (also called cache tags). GitHub uses Fastly as its CDN layer and uses surrogate keys to manage invalidation of rendered repository pages, commit listings, and file views.
Here's the problem that surrogate keys solve: a single GitHub repository page might be cached at hundreds of Fastly edge nodes globally. If someone pushes a new commit to that repository, GitHub needs to invalidate all the cached versions of the repository's home page, the commit list page, the default branch file tree, and possibly any pages that show the latest commit message. That's many distinct URLs spread across many edge nodes. Without surrogate keys, GitHub would have to enumerate every URL and issue a purge call for each β a maintenance nightmare as the product evolves and new pages are added.
With surrogate keys, GitHub's application server includes a response header like Surrogate-Key: repo:123456 user:789 org:42 when it serves a response. Fastly stores this metadata alongside the cached response. When a push event happens, GitHub issues a single Fastly API call: "purge all edges that carry the tag repo:123456." Fastly propagates this purge to all its edge nodes in seconds. All pages tagged with that repository ID β regardless of their URL structure β are invalidated in one operation.
The lesson: surrogate keys shift the invalidation model from "which URLs need to change?" to "which logical entity changed?" β matching how humans think about data dependencies rather than how HTTP caches store data.
Netflix's EVCache is an open-source distributed caching library (available at github.com/Netflix/EVCache) built on top of Memcached. It was designed for Netflix's scale and multi-region requirements. EVCache takes a pragmatic position on invalidation: use TTL as the primary expiry mechanism, and layer best-effort explicit deletes on top of it on every write.
Every EVCache entry has a TTL β typically set to match the data's business staleness tolerance. When Netflix's backend services write to the database, they also asynchronously fire a delete call to EVCache for the relevant key. The word "asynchronously" is critical: the delete is fire-and-forget. If it fails (because a Memcached node is temporarily unavailable, or the application server crashes after the DB write but before the delete call completes), the TTL is the backstop β the entry expires on its own schedule. This design accepts that a small fraction of delete calls will fail, and relies on TTL to bound the maximum staleness window.
For multi-region deployments, EVCache replicates cache writes and deletes across regions using a batched replication protocol. This means that a delete issued in the US-East region is eventually propagated to EU-West and AP-Southeast. The replication is asynchronous, so cross-region reads can transiently serve stale data during the propagation window β a trade-off Netflix accepts for the latency benefit of serving from a local region cache.
The lesson: combining TTL (the reliable backstop) with best-effort explicit deletes (the fast path) gives you a system that's tolerant of transient failures while still providing near-real-time invalidation in the common case.
Twitter's Manhattan is Twitter's distributed key-value store, described in engineering blog posts as the storage layer behind timelines, direct messages, and other core features. Twitter's cache layer in front of Manhattan uses a write-through model: writes go to both Manhattan and the cache tier simultaneously, so reads always see the latest written value without having to wait for a TTL expiry or an explicit purge event.
The interesting engineering challenge at Twitter's scale is maintaining this write-through guarantee across a multi-tier cache: Twitter uses both an in-process application-level cache (small, very fast, tied to a single host) and a distributed remote cache (Redis/Memcached cluster, shared across many hosts). A write that updates the distributed cache does not automatically update every application host's in-process cache. Twitter handles this by giving the in-process cache a very short TTL β measured in seconds, not minutes β so it acts as a micro-cache that's always nearly fresh, with the distributed cache as the authoritative second tier. The write-through guarantee holds at the distributed tier; the in-process tier is just a performance optimization with a short staleness window.
The lesson: write-through is clean in theory but requires careful thought in a multi-tier cache hierarchy. The solution is to apply write-through at the tier that matters most (the shared distributed cache) and accept short TTL staleness at the performance-optimization tier (in-process).
Architecture Comparison at a Glance
The comparison table makes one thing obvious: every system uses a different primary strategy, but every system also layers a fallback. No production system relies on a single invalidation mechanism alone. The pattern is: "fast path for freshness in the common case, TTL or versioned keys as the backstop for when the fast path fails."
Cache Coherence Across Replicas & Multi-Region
So far we've mostly talked about a single cache server and a single application server talking to a single database. That's a useful abstraction for understanding the strategies. But production systems almost never look like that. They have multiple application server instances (horizontal scaling), multiple cache nodes (a Redis cluster or a fleet of Memcached servers), and multiple geographic regions (for latency and fault tolerance). When you add any of these dimensions, the cache invalidation problem multiplies. Let's understand why β and what the solutions look like.
The Multi-Node Problem: Which Cache Node Gets the Delete?
Suppose you have a Redis cluster with six shards. Cache keys are spread across those shards using a math trick: hash the key, and the hash value tells you which shard owns it β so the same key always lands on the same shard, and adding or removing a shard only re-shuffles a small fraction of the keys. That trick is called consistent hashing. When your application issues a DEL user:profile:999 command, the Redis client library routes that command to the correct shard automatically. This is the best case: the delete goes exactly where the key lives.
The problem arises when you have multiple application server instances that each maintain their own in-process cache (a local HashMap or Caffeine cache). When server instance A receives a write for user #999 and updates the database, it can easily delete the key from its own in-process cache. But server instance B through Z still have the stale entry in their local caches. The write on instance A didn't trigger any notification to the other instances.
The solution is a pub/sub fan-out: when instance A performs the invalidation, it also publishes a message to a shared channel that all other instances subscribe to. Each subscriber receives the invalidation message and deletes the key from its local in-process cache. Redis pub/sub makes this straightforward to implement.
The diagram shows the three-step fan-out: the writing server updates the database (step 1), publishes an invalidation message to a Redis pub/sub channel (step 2), and all other application servers β which are subscribed to that channel β receive the message and delete the key from their local in-process caches (step 3). The Redis distributed cache doesn't need a fan-out β it's already shared β but every in-process cache on every host gets notified.
Redis Pub/Sub in Practice
The publisher doesn't need to know who is subscribed β it just fires the event. Redis delivers it to every active subscriber. If no subscribers are connected at the moment (e.g., during a deployment), the message is lost β pub/sub in Redis is fire-and-forget, not durable. For durability, use Redis Streams or Kafka instead.
Pattern subscribe (PSUBSCRIBE) lets each app server subscribe to all invalidation events with one command, even as new entity types are added. Each server parses the channel name β "invalidate:user:999" β extracts the entity type and ID, and deletes the relevant local cache entry. This requires zero coordination between app servers: they all receive the same message independently.
Multi-Region: The Killer Complexity
Within a single data center, pub/sub fan-out is fast β Redis pub/sub round-trips in sub-millisecond time. Across regions, the picture changes. Each region has its own Redis cluster (because cross-region Redis replication is expensive in latency). Invalidation events must be replicated from the primary region to every secondary region, adding the inter-region network latency to your staleness lag.
The diagram illustrates why cross-region cache coherence is inherently eventual, not strong. An invalidation event fires in US-East the moment the database is written. EU-West receives it after the replication lag β typically tens to low hundreds of milliseconds under normal conditions. AP-Southeast receives it later still. Until the event arrives, any app server in those regions serves the old cached value. The replication lag is your cross-region staleness window, and you cannot eliminate it without abandoning local-region caching entirely (which defeats the purpose of multi-region deployment).
Consistent-Hashing Routing as an Alternative to Fan-Out
An elegant alternative to fan-out is to design your cache key routing so that the same key always maps to the same cache node β and always maps to the same application server as well. This is called consistent-hashing routing. If key user:profile:999 always goes to app server #3, and that app server has an in-process cache, then you only need to invalidate the cache on server #3. No fan-out needed. The trade-off: this design requires sticky routing (consistent hashing at the load balancer level), which complicates deployment rolling restarts and reduces flexibility. Most teams only adopt it for hot-key scenarios where fan-out overhead is measurably expensive.
Tooling & Libraries β What Actually Exists
Every invalidation strategy we've discussed needs actual tools to implement it. The good news: most of the heavy lifting already exists in open-source libraries and managed services. The bad news: there are so many options that picking the right one is its own challenge. This section maps the tool ecosystem by layer β in-process cache, distributed cache, CDC pipeline, and CDN β and gives you a concise picture of what each tool does, what invalidation use case it serves, a quick syntax example, and importantly, when not to use it.
The ecosystem map groups tools into five layers. Each layer is the right tool for a specific part of the invalidation problem. In-process caches are fastest but need fan-out. Distributed caches are the shared authority. CDC pipelines are the event source. Managed DB caches bundle write-through as a service. CDNs and HTTP headers handle the edge layer. Now let's go through each with enough detail to actually use them.
What it is: Redis is the most widely used distributed cache. For cache invalidation, Redis gives you three mechanisms: explicit key deletion, pub/sub for fan-out, and keyspace notifications.
Explicit delete: DEL user:profile:999 (synchronous, blocks until the key is deleted) or UNLINK user:profile:999 (asynchronous, queues the deletion without blocking the calling thread β prefer UNLINK for large values). Use this for explicit purge strategies.
Pub/sub fan-out: PUBLISH invalidate:user:999 "deleted" notifies all subscribers. Combine with PSUBSCRIBE invalidate:* on each app server for in-process cache invalidation. Not durable β messages sent to disconnected subscribers are lost.
Keyspace notifications (via notify-keyspace-events config): Redis can publish a message on its own internal pub/sub whenever a key expires, is set, or is deleted. Subscribe to __keyevent@0__:expired to react to TTL expiry events. Useful for triggering downstream refresh logic.
When NOT to use Redis pub/sub for invalidation: when you need durable delivery (application restarts or deployments will lose messages). Use Kafka or Redis Streams (XADD/XREAD) instead.
What it is: Debezium is an open-source CDC (Change Data Capture) framework that reads the binary log or WAL of PostgreSQL, MySQL, MongoDB, and others, and publishes row-level change events to a Kafka topic.
Invalidation use case: Any time a row changes in the database, Debezium publishes an event like {"op": "u", "after": {"id": 999, "name": "Alice"}, "source": {...}} to a Kafka topic. A consumer service subscribes to that topic and issues cache invalidation commands based on the event. This solves the dual-write problem entirely β the application only writes to the database; the cache invalidation happens as a downstream reaction to the change log, not as a second write from the application.
When NOT to use Debezium: when your database doesn't support CDC (e.g., some older MySQL setups without binlog replication enabled), or when operational complexity of running Kafka + Kafka Connect is too high for your team size. For small systems, an explicit purge on write is far simpler and nearly as fast.
What it is: Caffeine is the dominant in-process cache library for the JVM (Java/Kotlin/Scala). It uses a smart eviction algorithm called W-TinyLFU (which combines "how recently was this used" with "how often" to pick what to drop), and has first-class support for both TTL expiry and asynchronous cache refresh.
Invalidation use case: Caffeine's refreshAfterWrite policy implements the stale-while-revalidate pattern: when a key has been in the cache for longer than the refresh duration, the next access returns the stale value immediately (no latency) and triggers an asynchronous background refresh. This eliminates stampedes entirely β no waiting threads, no thundering herd.
When NOT to use Caffeine: when you have multiple JVM instances and need cross-instance coherence. Caffeine is per-JVM; without a pub/sub invalidation fan-out, each JVM's cache is an isolated island. Combine Caffeine with Redis pub/sub for multi-instance deployments.
What it is: Amazon DynamoDB Accelerator (DAX) is a fully managed, in-memory cache that sits in front of DynamoDB and speaks the DynamoDB API. It provides write-through caching out of the box β your application makes a standard DynamoDB PutItem or UpdateItem call, and DAX automatically updates its cache alongside the database write.
Invalidation use case: You don't have to write any invalidation code at all. DAX handles it. A PutItem on an item updates both DynamoDB and the DAX cache. A DeleteItem removes the item from both. The DAX cache is always consistent with DynamoDB for items that have been written through it. Reads that were cached by a prior read and not yet overwritten by a write will expire by TTL (default 5 minutes for both the item cache and the query/scan cache, both configurable at cluster-creation time).
When NOT to use DAX: DAX is DynamoDB-only. It doesn't work with any other database. Also, DAX's query/scan cache does not automatically invalidate when the underlying items change β if you update an item, DAX's item cache updates, but a query that previously returned that item may still cache the old result until the query-cache TTL expires (5 minutes by default). This is a common source of subtle consistency bugs.
What it is: HTTP itself has a sophisticated cache invalidation model, defined in RFC 9111 (the 2022 HTTP Caching spec that obsoleted RFC 7234) for Cache-Control / Expires, and RFC 5861 for stale-while-revalidate / stale-if-error. Browsers, CDNs (Cloudflare, Fastly, CloudFront), and reverse proxies (nginx, Varnish) all implement these headers natively.
Invalidation via Cache-Control: set Cache-Control: max-age=300 to give the response a 5-minute TTL in any HTTP cache. Set Cache-Control: no-store to prevent caching entirely. Set Cache-Control: max-age=31536000, immutable for assets that never change (hashed filenames like app.v3f8a9.js).
Invalidation via ETag / If-None-Match: the server includes an ETag header (a hash or version of the response content). The client caches the ETag alongside the response. On the next request, it sends If-None-Match: "abc123". If the server's current ETag matches, it replies with 304 Not Modified (no body, very fast). If it doesn't match, the server sends the full new response. This is conditional GET β the client asks "has this changed?" before re-downloading.
stale-while-revalidate: Cache-Control: max-age=60, stale-while-revalidate=120 tells HTTP caches: "serve from cache for 60 seconds without any validation, then for the next 120 seconds serve the stale copy while asynchronously fetching a fresh version in the background." This is the HTTP-native version of stale-while-revalidate at the edge layer.
When NOT to use CDN-level invalidation: CDN purge APIs are asynchronous β a purge request is queued and propagated to edge nodes over seconds to minutes, not milliseconds. For data that needs immediate consistency (inventory, pricing under legal pricing rules), CDN-level caching is not appropriate at all. CDNs are the right layer for content that changes infrequently and where seconds-to-minutes staleness is acceptable.
What it is: Memcached is the older and simpler distributed cache (compared to Redis). It supports three invalidation primitives: delete key (explicit delete), flush_all (wipe every key in the cluster β use with extreme caution), and gets / cas (compare-and-swap, for atomic updates).
flush_all: one of the most dangerous commands in a production cache. It takes an optional delay parameter β flush_all 300 schedules a full cache wipe in 5 minutes. Used for emergency resets when the cache state is known to be corrupt. Never call it in application code paths; it belongs in a break-glass runbook only.
When NOT to use Memcached for modern invalidation: Memcached has no pub/sub, no keyspace notifications, no persistence, and no built-in atomic data structures. For anything beyond basic TTL + explicit delete, Redis is a better choice today. Memcached's main advantage is simplicity and slightly higher throughput for pure key-value GET/SET workloads.
Cheat Sheet & Glossary β The 30-Second Recap
Use this section as a quick reference when you need to recall a pattern or a term without re-reading the whole page. The cheat sheet gives you the one-sentence essence of each strategy. The glossary gives you precise definitions of every term used on this page.
Strategy Cheat Sheet
Glossary
- Staleness
- The condition where a cached copy of data no longer matches the current value in the source of truth (the database). A cached entry is stale the moment its source data changes. Staleness is measured as a duration: the time between when the source data changed and when the cached entry was invalidated.
- Dual-Write Race
- The consistency problem that arises when an application must write to two systems (e.g., a database and a cache) and there's no transaction spanning both. If the first write succeeds and the second fails, the two systems diverge silently. The transactional outbox pattern is the standard solution.
- Transactional Outbox
- A pattern where, instead of writing to the cache directly, the application writes a "pending invalidation event" to an outbox table in the same database transaction as the main write. A separate process reads the outbox table and fires the cache invalidation. Since both the main write and the outbox row are committed in one transaction, they're always consistent.
- Write-Through
- A cache write policy where every write updates the cache and the database simultaneously. The cache is always current. Contrast with write-around (skip the cache on write, let TTL refresh it) and write-back / write-behind (write to the cache first, flush to the database asynchronously).
- Write-Around
- A cache write policy where writes go directly to the database and bypass the cache. The cache entry for the written data is either left to expire via TTL or explicitly purged. Reduces write load on the cache at the cost of a cold cache miss for the next read.
- Write-Back (Write-Behind)
- A cache write policy where writes land in the cache first and are asynchronously flushed to the database later. Reduces write latency (the application doesn't wait for the DB). Risk: if the cache node fails before the flush, the write is lost. Rarely used for caches that store user-facing data due to durability concerns.
- CDC (Change Data Capture)
- A technique for capturing every row-level INSERT, UPDATE, and DELETE in a database by reading the database's internal change log (binary log in MySQL, WAL in PostgreSQL). CDC enables downstream systems (caches, search indexes, analytics pipelines) to react to data changes without polling the database and without requiring application code changes.
- Generation Bump
- A cache invalidation technique where a version counter scoped to a logical group (e.g., a merchant, a category, a tenant) is incremented atomically. All cache keys for that group incorporate the generation number. Incrementing the counter makes all old keys unreachable in a single O(1) operation, regardless of how many individual entries exist.
- Surrogate Key
- A metadata tag attached to a cached HTTP response (or CDN cache entry) that identifies the logical entities the response depends on. When an entity changes, all cached responses tagged with that entity's surrogate key are invalidated together with one API call. Implemented by Fastly, Cloudflare, and Varnish.
- Stampede (Thundering Herd)
- When a popular cache key expires (or is deleted) and many concurrent requests simultaneously discover the cache miss, they all hit the database at once, creating a burst of DB load. The database can be overwhelmed. Solutions: probabilistic early expiration (XFetch), leases, or stale-while-revalidate.
- Singleflight
- A concurrency pattern where, when multiple goroutines (or threads) request the same missing cache key simultaneously, only one of them actually fetches from the database. The rest wait for and share the result of that single fetch. Eliminates per-key stampedes within a single process. Go's
golang.org/x/sync/singleflightis the canonical implementation. - XFetch (Probabilistic Early Expiration)
- An algorithm that proactively refreshes a cache entry before it expires by computing a probability that increases as the TTL deadline approaches, weighted by the expected fetch time. The key insight: it's better to occasionally fetch fresh data slightly early (during low-traffic time) than to guarantee a stampede at exactly the TTL deadline.
- stale-while-revalidate
- A cache freshness model (defined in RFC 5861 for HTTP) where a stale cached response is served immediately to the current request while a background process fetches a fresh version. The current request sees zero added latency; the next request (after the refresh completes) sees fresh data. Supported natively by modern browsers and CDNs via
Cache-Control: stale-while-revalidate=N. - Immutable Assets
- Static files (JavaScript, CSS, images) that are given content-addressed filenames (e.g.,
app.v3f8a9c1.js) so their URL changes whenever their content changes. They can be cached with a year-long TTL (Cache-Control: max-age=31536000, immutable) because a changed file gets a new URL, making the old cached entry unreachable by definition. The "cache invalidation" here is handled entirely by URL versioning β the cleanest possible invalidation strategy.