TL;DR β Eviction in Plain English
- Why cache eviction matters: the direct link between picking the wrong eviction victim and a hit-ratio collapse that melts your database
- The 7 canonical eviction policies (LRU, LFU, FIFO, Random, MRU, ARC, TinyLFU) β the idea behind each, when each shines, and when each fails spectacularly
- Why LRU is O(1): the HashMap + doubly-linked-list trick that makes list reordering constant-time
- The failure modes β scan pollution, one-hit wonders, sequential flooding β that bring even battle-tested policies to their knees
- How to pick the right policy for any workload: temporal locality vs. frequency locality, and why most caches in the wild use TinyLFU under the hood today
When a cache fills up, every new entry needs to kick something out. The question is: which entry? Pick the wrong one β the one your traffic actually needs β and your hit ratio quietly collapses. Pick the right one β the entry least likely to ever be requested again β and a 1 GB cache can serve 99% of traffic for a 100 GB dataset. Eviction policy is the algorithm that makes that decision, and understanding why each policy works (or breaks) is what separates engineers who configure Redis from engineers who design caching systems.
A cache is finite. Your data isn't. So the moment the cache fills up, you have to make a choice: when a new key arrives, which old key gets thrown out to make room? This is the eviction policy. Different policies use different signals β "when was this last touched?" (recency), "how often has this been requested?" (frequency), "when did this arrive?" (insertion order) β to guess which entry is least valuable to keep. Get the guess right and the working set stays warm. Get it wrong and you spend CPU and DB time constantly refetching entries you just evicted.
LRU (Least Recently Used) throws out the entry you haven't touched the longest β great for bursty, recency-driven traffic, terrible when a full-table scan blasts through the cache. LFU (Least Frequently Used) throws out the entry with the fewest total accesses β great for stable hot keys, terrible when a new viral item can't survive infancy. FIFO throws out the oldest-inserted entry regardless of access β simple and wrong for most real workloads. Random throws out a random entry β surprisingly competitive with LRU in practice because cache misses are cheap compared to tracking recency. MRU (Most Recently Used) throws out the most recently used entry β bizarre but optimal for repeated sequential scans. ARC (Adaptive Replacement Cache) runs both an LRU queue and an LFU queue, dynamically shifting weight between them β adapts automatically to workload character. TinyLFU approximates frequency using a sketch data structure and filters entries through a recency-biased admission gate β the modern industry default because it gets nearly optimal hit ratios at very low memory overhead.
The tricky part is that the best eviction policy depends entirely on your workload's access pattern β and access patterns change. An e-commerce site in the morning is mostly browsing (broad reads across many products). At checkout time it hammers a handful of product and inventory keys. A CDN node sees mostly one-time reads for unique URLs. A social feed sees a power-law distribution where 1% of posts get 99% of reads. LRU handles the e-commerce case well. TinyLFU handles the CDN case. ARC adapts between them. The page that follows goes deep on the mechanics of each policy so you can reason about workloads you've never seen before β not just memorize a decision table.
Why You Need This β When Your Hit Ratio Quietly Collapses
Most engineers treat eviction as a checkbox. Redis has maxmemory-policy set to allkeys-lru by default. Job done. But there's a specific failure mode that hits production teams every week and takes hours to diagnose, because it doesn't announce itself loudly β it just makes everything slightly slower until suddenly everything is very slow.
The Production Story: One Nightly Job, One Database Fire
Picture a social platform. The cache holds the 2 million most recently accessed user profile objects. At peak hours, 95% of requests hit the cache β the app is snappy, the database idles at 8% CPU, life is good. Then the data team runs a nightly analytics job. The job reads every user row in the database β all 50 million β sequentially. Each read comes back to the application layer, which (because the cache-aside pattern is in place) writes the result to Redis.
Within minutes, those 50 million rows have completely replaced the 2 million warm entries. The analytics job accessed each row exactly once, so none of those 50 million rows will ever be requested again. But LRU doesn't know that β they were all "recently used" as of the analytics scan. Morning traffic arrives. Every user profile request is a cache miss. The database CPU spikes to 85%. The on-call engineer pages at 3 AM wondering why a healthy system is suddenly melting.
This failure has a name: scan pollution (sometimes called "sequential flooding"). The analytics scan polluted the cache with entries that have zero future value, evicting the warm working set that actually served real traffic. It is the canonical LRU failure mode, and it happens any time a large sequential access pattern shares a cache with a hot-key workload.
The Math That Makes Eviction Decisions Life or Death
Let's make this concrete with numbers. Suppose your cache can hold 10,000 entries and your hot working set is 8,000 entries. At steady state with LRU, those 8,000 hot entries live in the cache and hit ratio sits at 95%. Now suppose the analytics job writes 50,000 new entries. LRU evicts the 40,000 least-recently-seen entries β which, because the hot entries haven't been touched during the scan, includes most of your working set. After the scan, your hot entries are mostly gone. Hit ratio drops toward 60%. Every miss costs a database roundtrip: call it 15 ms. At 5,000 requests per second, a drop from 95% to 60% hit ratio means 1,750 more database reads per second instead of 250. That's a 7Γ increase in database load from a single eviction policy mismatch.
The diagram above shows what scan pollution looks like in the cache's memory. Before the analytics scan, 8,000 of 10,000 slots hold hot working-set entries that real traffic re-reads constantly. After the scan, almost all of those slots are occupied by one-time-read analytics rows that will never be requested again. The cache is still full β but it's full of garbage from the perspective of your production traffic.
The Question Eviction Policy Answers
Every eviction policy is essentially an answer to one question: "Of all the entries currently in the cache, which one is least likely to be requested in the near future?" You can't know the future, so every policy uses a proxy signal β past recency, past frequency, insertion order, or a combination. The gap between the proxy signal and the true future access likelihood is what determines whether you evict the right entry or the wrong one. Scan pollution is what happens when recency (the LRU signal) is a terrible proxy because the most recent accesses came from a one-time scan, not from real traffic.
Mental Model β The Memory-Pressure Loop
Before we get into specific algorithms, let's build the mental model of what happens inside a cache every single time a new entry needs to go in and the cache is already full. This five-step loop runs billions of times per day in any serious production system, and understanding it in sequence is the foundation for understanding why each policy makes the choices it does.
The Five Steps Every Eviction Follows
Think of the cache as a coat check room at a restaurant β limited hooks, first-come policy, but with a bouncer at the door who decides who has to leave when the room fills up. Here's what happens every time a new coat arrives and the room is full:
- New entry arrives β a write request comes in: "cache this key-value pair."
- Overflow detected β the cache checks its current size against its configured maximum. It's at capacity.
- Policy fires β the eviction algorithm runs and selects a victim: the entry that the policy considers least valuable to keep.
- Victim deleted β the victim entry is removed from the cache, freeing one slot.
- New entry inserted β the new entry takes the freed slot. The cache is full again, at steady state.
Steps 3 and 4 are where policies differ. FIFO's "policy fires" step is trivial: look at the queue head, that's the victim. LRU's step is more expensive conceptually but still O(1) in a good implementation: look at the tail of the recency-ordered doubly-linked list, that's the victim. TinyLFU's step is the most sophisticated: consult a frequency sketch, compare the incoming entry's estimated frequency to the victim candidate's, and admit the new entry only if it's likely to be accessed more. We'll unpack each of these in later sections.
The diagram above shows the loop as a cycle β it repeats forever as long as traffic writes to a full cache. Notice that step β’ (policy fires) is the only step that varies between eviction algorithms. Every other step is identical regardless of which policy you use. This means that when we compare LRU to LFU to TinyLFU, we're really only comparing one thing: which victim gets selected in step β’ and why.
What "Good" Looks Like
A good eviction policy selects victims that will not be requested again (or will be requested the least) in the near future. The ideal victim is an entry that the system will have to fetch from the database at some point anyway β but that point is so far in the future that paying a miss penalty then is much cheaper than paying it for entries that will be needed imminently. The classic way to measure this is the Miss Ratio Curve (MRC) β but for now, the intuition is: a perfect policy minimizes the number of times you evict something you end up fetching again shortly after.
One more concept worth planting here: ghost entries. Some advanced policies (ARC, TinyLFU) keep a lightweight record of recently evicted keys β just the key, no value. If a ghost entry gets re-requested, it tells the algorithm "you made a bad eviction decision β consider adjusting your strategy." Ghost entries are essentially the cache's mechanism for learning from its mistakes.
Core Concepts β The Vocabulary of Eviction
Before we can talk about why LRU beats FIFO for certain workloads, or why TinyLFU uses a "window" area, we need a shared vocabulary. Eight terms appear constantly in any discussion of eviction. For each one, here's the plain-English meaning first β then the precise technical term you'll see in papers and documentation.
The Eight Terms You Must Know
Working set β The group of cache entries that real traffic actually reads repeatedly. Think of it as "the stuff that matters." If your cache is large enough to hold the entire working set, hit ratio will be high. If it's smaller, some working-set entries keep getting evicted and re-fetched β that's cache churn, and it's where hit-ratio collapses come from.
Recency β How recently an entry was last accessed. An entry that was read 1 millisecond ago has high recency. An entry that was last read 2 hours ago has low recency. Recency is the signal that LRU (Least Recently Used) uses to decide victims: the entry with the lowest recency (longest time since last access) gets evicted first. Recency is a good proxy for future use when traffic has temporal locality β but a terrible proxy when scans blast through the cache.
Frequency β How many times an entry has been accessed in total (or in a recent time window). An entry that's been read 10,000 times is a hot key β even if it hasn't been touched in the last 5 minutes. Frequency is the signal that LFU (Least Frequently Used) uses: the entry with the lowest total access count gets evicted first. Frequency is great for stable hot-key workloads, but it has an "aging" problem: an entry that was hot last week but is irrelevant now keeps a high count and resists eviction even though it's no longer valuable.
Age β How long an entry has been in the cache since it was first inserted. FIFO (First In, First Out) uses age as its signal: the oldest-inserted entry gets evicted regardless of how often it's been accessed since. Age is almost never a good proxy for future access likelihood, which is why FIFO performs poorly on most real workloads.
Churn β The rate at which entries are evicted and then re-fetched. Low churn means the policy is making good eviction decisions (kicking out things that won't be needed). High churn means the policy keeps kicking out things that are immediately re-fetched β every eviction triggers a database read shortly after, which defeats the purpose of caching. The hit ratio is the aggregate measure of churn: low hit ratio = high churn.
Miss penalty β The cost of a single cache miss. When an entry isn't in the cache, the application has to fetch it from the database (or origin server, or S3, etc.) and typically write it back to the cache. The miss penalty is the extra latency and CPU cost of that roundtrip. Miss penalties of 5β50 ms are typical for database reads. In a CDN context, a miss penalty might be a full origin fetch costing hundreds of milliseconds. Eviction policy matters more when the miss penalty is high β because every wrong eviction decision directly costs that many milliseconds of latency for real user requests.
Ghost entries β A lightweight record (just the key, no value) of recently evicted entries. Advanced algorithms like ARC and TinyLFU keep ghost queues. If a ghost entry is re-requested β meaning the cache evicted it but traffic immediately wanted it again β that's evidence of a bad eviction decision. Ghost queues let algorithms self-correct without requiring external tuning.
Miss Ratio Curve (MRC) β A graph where the x-axis is cache capacity and the y-axis is miss ratio. For most realistic workloads, the MRC has a steep drop-off at the working-set size: caches smaller than the working set miss a lot; caches larger than the working set miss very little. The MRC tells you how much each additional megabyte of cache capacity is worth in terms of hit ratio β useful for sizing decisions and for comparing how well two different policies exploit the same capacity.
The diagram above shows how these terms relate. The top group (recency, frequency, age) are input signals β what the eviction policy reads from the cache's metadata to decide which entry to evict. The left side (working set, ghost entries) provide context that helps the policy make better decisions. The bottom group (churn, hit ratio, MRC) are output metrics β how you measure whether the policy's decisions were any good. If your churn is high and hit ratio is low, your policy is selecting bad victims and you should try a different signal.
The 7 Canonical Eviction Policies β Overview
Seven eviction policies cover the vast majority of what you'll encounter in production systems, academic papers, and system design interviews. Each policy is a different answer to "which entry is least valuable to keep?" β and each one makes that judgment using a different signal and a different data structure. This section gives you the overview; later blocks go deep on each one.
A useful framing before we start: think of each policy as a different kind of "bouncer" at the coat-check door. The bouncer's job is to pick who has to give up their coat hook. Different bouncers use different criteria. One (FIFO) asks "who's been here longest?" Another (LRU) asks "who came in most recently β because that person will probably leave soon and won't need their coat back." Another (TinyLFU) asks "who uses their coat the least across their entire history here?" Different workloads reward different bouncers.
The seven-panel diagram above gives you the quick-reference view. Notice a few patterns. First, the simpler policies (FIFO, Random) have obvious weaknesses because they ignore relevant signals. Second, the "complex" policies (ARC, TinyLFU) exist specifically to fix the failure modes of the simpler ones β they're more sophisticated precisely because real workloads have edge cases that simpler policies can't handle. Third, complexity has a cost: ARC requires tracking twice as many entries (real + ghost), and TinyLFU requires a frequency sketch that needs periodic decay to stay accurate.
The Reference Table
Here's the same information in table form β useful as a quick-reference when you're deciding which policy to reach for:
| Policy | Eviction Signal | Best Workload | Killer Weakness | Data Structure |
|---|---|---|---|---|
| LRU | Recency (last access time) | Bursty, hot-key, temporal locality | Scan pollution β one sequential sweep destroys working set | HashMap + Doubly-Linked List |
| LFU | Frequency (total accesses) | Stable hot keys, power-law distribution | New-entry starvation β viral content can't survive infancy | Min-heap or frequency-bucket list |
| FIFO | Age (insertion time) | Streaming data, event queues | Belady's anomaly β evicts items regardless of access count | Circular queue / ring buffer |
| Random | None (random slot) | Uniform access, hardware TLBs | Non-deterministic; can evict the hottest entry by chance | Array index selection |
| MRU | Most-recent access (inverted) | Repeated sequential scans (DBs) | Catastrophic for any non-sequential workload | Stack / reversed LRU list |
| ARC | Recency + frequency (adaptive) | Mixed or shifting workloads | Patent issues (IBM); doubles entry tracking overhead | 2 LRU queues + 2 ghost queues |
| TinyLFU | Frequency sketch (probabilistic) | General purpose (industry default) | Count-Min Sketch needs decay; higher implementation complexity | Count-Min Sketch + SLRU queues |
LRU Deep Dive β Why HashMap + Doubly-Linked List is O(1)
LRU is the most widely deployed eviction policy in production systems. Redis's default is allkeys-lru. Memcached uses LRU. Countless in-process caches default to it. It's not because LRU is the best policy for all workloads β it isn't, as we saw in S2. It's because LRU is a very good policy for most workloads AND it can be implemented with O(1) operations for every cache operation: get, set, and evict. That combination β good behavior, constant-time guarantees β is hard to beat for a general-purpose default.
The Analogy: A Stack of Parking Permits
Imagine a parking lot with 5 spaces and a single attendant. When a new car arrives and the lot is full, the attendant picks the car whose permit was stamped the longest ago β that's the car least recently "used." But here's the constraint: the lot is busy, and the attendant needs to do this in constant time, not by scanning the whole lot. So instead of checking timestamps on every car, the attendant keeps a physical stack of permits ordered by recency: the most recently arrived car's permit is on top, the least recently arrived car's permit is on the bottom. When a car needs to be evicted, just pull the bottom permit. When a car is accessed (re-entered or moved), pull its permit from wherever it is and put it on top. Every operation is O(1) if you can pull a permit from any position instantly.
That "pull from any position instantly" is the key insight. A regular stack or array can't do that β removing from the middle is O(n). You need a data structure where removing an element from any position is O(1) regardless of where it is. That data structure is the doubly-linked list.
The Data Structure: HashMap + Doubly-Linked List
The classic LRU cache uses exactly two data structures working together:
- A doubly-linked list ordered by recency β most recently used at the head, least recently used at the tail. The tail node is always the eviction victim.
- A HashMap from cache key to the node's pointer (or reference) in the linked list.
Why do you need the HashMap? Because when a get request arrives for key K, you need to find the node for K in the linked list in O(1) time β without scanning the list. The HashMap gives you direct pointer access to any node, bypassing the list traversal entirely. Once you have the pointer, you can remove the node from its current position (O(1) doubly-linked-list splice) and move it to the head (O(1) insert). Total cost: O(1).
Here's the contract every operation follows:
- GET(key): look up node in HashMap β O(1). If found (cache hit), move the node to the head of the list β O(1). Return the value. Total: O(1).
- SET(key, value) β no eviction needed: create a new node, insert at head of list β O(1). Insert keyβnode into HashMap β O(1). Total: O(1).
- SET(key, value) β eviction needed: read the tail node (the LRU victim) β O(1). Remove it from the list β O(1). Delete its key from HashMap β O(1). Then do the normal insert for the new entry β O(1). Total: O(1).
The diagram shows a cache with capacity 4 holding keys A, D, B, C β ordered from most recently used (A, at the head) to least recently used (C, at the tail). C is the current eviction victim. The HashMap at the bottom holds a direct pointer to each node, so any get request can jump to the right node in O(1) without walking the list. When key B is accessed, the HashMap immediately locates nodeB, the list splices it out of its current position and re-inserts it at the head β two pointer updates β and the list order becomes AβBβDβC. No list traversal was needed.
A 6-Step Worked Example
Let's walk through a cache with capacity 3 and trace six operations. This is the standard interview demonstration of LRU correctness.
Step β€ is the crucial one to understand. When GET A comes in, A is at the tail β it would have been the next victim. But because GET A is a cache hit, LRU promotes A to the head of the list. This single operation changes who gets evicted in step β₯: B (which hasn't been touched since step β’) becomes the tail and gets evicted instead of A. LRU is essentially a "save yourself by being useful" policy: every time you're accessed, you move to safety. Only entries that nobody reads for the longest time end up at the tail and get evicted.
The O(1) Proof
Let's be precise about why every operation is O(1). The claim rests on three facts about the data structures:
- HashMap lookup: average O(1) per standard hash table analysis. In the worst case (all keys hash to the same bucket), it degrades to O(n), but with a good hash function and reasonable load factor this is negligible in practice.
- Doubly-linked list removal from an arbitrary position: O(1) given a pointer to the node. You update exactly 4 pointers:
node.prev.next = node.nextandnode.next.prev = node.prev. Done. No traversal needed. - Doubly-linked list insertion at head: O(1). Update 2 pointers:
node.next = head; head.prev = node. Point the list's head reference at node. Done.
The HashMap gives you the node pointer in O(1), then the doubly-linked list lets you remove and reinsert in O(1). Put them together and every cache operation β get (hit), set (no eviction), set (with eviction) β is O(1) end-to-end. This is why LRU is so widely deployed: it gives you recency-ordered eviction at the cost of two data structures and zero algorithmic overhead per operation.
The Pseudocode
Here's the core LRU implementation logic β clean pseudocode first so the concept is clear before you see it in a real language:
class LRUCache:
def __init__(self, capacity):
self.cap = capacity
self.map = {} # key β Node
# sentinel head/tail nodes β simplify edge cases (no null checks)
self.head = Node() # dummy MRU sentinel
self.tail = Node() # dummy LRU sentinel / victim pointer
self.head.next = self.tail
self.tail.prev = self.head
def get(self, key):
if key not in self.map:
return -1 # cache miss
node = self.map[key]
self._remove(node) # unlink from current position β O(1)
self._insert_at_head(node) # move to MRU position β O(1)
return node.value
def set(self, key, value):
if key in self.map:
self._remove(self.map[key]) # update existing entry
node = Node(key, value)
self.map[key] = node
self._insert_at_head(node) # O(1)
if len(self.map) > self.cap:
# evict: the node just before the tail sentinel is the LRU victim
victim = self.tail.prev
self._remove(victim) # O(1) β 4 pointer updates
del self.map[victim.key] # O(1) β HashMap delete
def _remove(self, node):
# splice node out of the doubly-linked list β 4 pointer updates
node.prev.next = node.next
node.next.prev = node.prev
def _insert_at_head(self, node):
# insert node immediately after the head sentinel
node.next = self.head.next
node.prev = self.head
self.head.next.prev = node
self.head.next = node
The sentinel head and tail nodes (dummy nodes with no data) are a common implementation trick. They eliminate the special-casing you'd need when the list is empty or when you're removing the first or last real node. With sentinels, every _remove call looks identical: four pointer updates, no if-statements. The LRU victim is always self.tail.prev β the node just before the tail sentinel. No scanning, no comparison, no computation: just one pointer dereference.
The Killer Weakness: Scan Pollution (Preview)
We previewed this in S2, but now you can see it at the data-structure level. When an analytics scan reads 50 million rows and writes them to the cache, each write calls set(), which calls _insert_at_head(). Every scan entry becomes the new MRU. The previously warm working-set entries get pushed toward the tail with every scan insert. After 10,000 scan entries fill a 10,000-slot cache, every working-set entry has been evicted β not because it was cold, but because the scan's volume overwhelmed the capacity. LRU has no way to distinguish "scan entry I wrote once" from "hot entry re-read 10,000 times" β they both look like "recently written" from the list's perspective.
The fix? Policies that use frequency (TinyLFU) can see that the scan entries have been read exactly once, while hot entries have been read thousands of times, and refuse to admit low-frequency scan entries into the main cache. That's the architectural reason why TinyLFU outperforms plain LRU on mixed workloads. We'll cover that in full in a later block.
LFU Deep Dive β Why Frequency Wins for Long-Tail Data
LRU asks: "when did you last show up?" LFU asks a different question entirely: "how often do you show up?" Those two questions produce dramatically different eviction victims, and understanding when each question is the right one to ask is the core skill this section builds.
The Plain-English Intuition: Most-Played vs. Most-Recently-Played
Imagine you're managing a music player's offline cache β you can only keep 100 songs on the device. LRU would keep the 100 songs you most recently played. If you went on a nostalgia trip last night and played 80 obscure 90s songs once each, LRU would evict your top 20 all-time favorites to make room β because those favorites were last played two days ago. That's wrong. LFU would keep the 100 songs you've played the most total times. Your top 20 favorites played 500 times each? They stay. Last night's one-time plays? Gone first. For a stable, predictable "hot songs" workload, LFU's frequency signal is a far better proxy for future access than LRU's recency signal.
This is exactly what happens in databases with long-tail access distributions. A product catalog might have 10 million SKUs, but 200 of them β the bestsellers β account for 80% of all reads. LFU naturally keeps those 200 in cache permanently. LRU might evict them if a broad promotional email causes a burst of one-time lookups across thousands of different product pages.
The Algorithm: Frequency Buckets in O(1)
The naive LFU implementation scans all entries to find the one with the lowest frequency counter β that's O(n) per eviction, which is too slow. The production-grade version, described by Shah, Mitra, and Matani in 2010, achieves O(1) insert, O(1) access update, and O(1) eviction using a clever two-level data structure:
- A HashMap from key β node (for O(1) lookups, same as LRU)
- A doubly-linked list of frequency buckets β bucket 1 holds all entries accessed exactly once, bucket 2 holds entries accessed twice, etc.
- Within each bucket, a doubly-linked list of entries β these are ordered by insertion time within the bucket, so when two entries have the same frequency, the older one (inserted earlier into that bucket) is evicted first
- A pointer to the minimum-frequency bucket β always points to the bucket with the fewest accesses, so finding the eviction victim is O(1): just look at the tail of the min-bucket's list
When a key is accessed, its node moves from bucket f to bucket f+1. If bucket f+1 doesn't exist yet, create it. If bucket f is now empty, delete it. If f was the minimum-frequency bucket, the minimum pointer advances to f+1. All of this is O(1) β pointer manipulations on doubly-linked lists, no scanning.
The diagram above shows the bucket chain in action. The min-freq pointer always points at the leftmost (lowest-frequency) bucket β that's where evictions happen. High-frequency hot keys accumulate in high-numbered buckets and are immune from eviction until something less-accessed takes the min-freq slot. Notice that within the min-freq bucket, entries are ordered by insertion time: the oldest entry in that bucket (leftmost) is evicted first, breaking frequency ties fairly.
The trace above shows the key insight: key A was accessed 3 times early on, so even when newer keys B, C, D, and E arrive, A is protected. B gets evicted first (same frequency as C, but inserted into that bucket earlier). C gets evicted next, despite being in the cache longer than D, because D accumulated frequency faster. This is LFU working correctly β it protects long-term hot keys and sacrifices cold ones.
When LFU Outperforms LRU β and When It Fails
LFU wins when your working set is stable over time. Product bestsellers, API endpoints, configuration values, user session tokens for active users β these get accessed repeatedly over hours or days. LFU accumulates enough history to recognize and protect them even under scan-like write bursts.
LFU fails in two situations. First: cache pollution from stale popularity. If a news article went viral yesterday and got accessed 10,000 times, it has a very high frequency count today. But nobody reads yesterday's viral articles. LFU will keep it in cache for a long time because of historical frequency, even though its future access probability has collapsed to near zero. This is sometimes called the "stale popularity" problem β frequency counts are a lagging indicator.
Second: new hot keys can't survive infancy. A brand-new viral article starts with frequency=1. Every existing key in cache has higher frequency. The new hot key will be evicted before it ever gets a chance to accumulate frequency. This is the cold-start problem for LFU, and it's why pure LFU is rarely used in production without modification. TinyLFU (next section) solves both problems.
TinyLFU & W-TinyLFU β How Caffeine Crushes LRU
Here's an uncomfortable truth about real-world cache traffic: most items are one-hit wonders. Studies on web caches, CDN logs, and database access traces consistently show that 60β90% of all unique keys are accessed exactly once, then never again. If your cache admits every key on first access (as LRU and LFU both do), you're constantly evicting genuinely hot keys to make room for items that will never be requested again.
TinyLFU is built on one core insight: before you admit a new key into the cache, check whether it's likely to be accessed again. If it's probably a one-hit wonder, don't evict a warm key for it. This admission filter is the key innovation, and the data structure that makes it memory-efficient is called a Count-Min Sketch.
The Count-Min Sketch: Approximate Frequency in Tiny Space
Think of the Count-Min Sketch as a cheat sheet for frequency counting. Instead of storing exact hit counts for every possible key (which would require memory proportional to your total keyspace β potentially billions of entries), you use a fixed-size 2D array with multiple hash functions. When a key is accessed, you increment one cell per row using a different hash function for each row. To estimate a key's frequency, you read all the cells it maps to and return the minimum β hence "Count-Min." The sketch can over-estimate (due to hash collisions between different keys sharing cells) but never under-estimates, which is the safe direction for an admission filter.
The critical property: a sketch with 8 rows and 4096 columns uses just 32,768 counters β typically 4 bits each β which is about 16 KB of memory. That 16 KB of sketch lets you track approximate frequency for an effectively unlimited keyspace, and TinyLFU's admission decision costs O(1) per access (8 hash lookups and 8 increments).
The diagram shows how one key's frequency is estimated: each of 4 hash functions maps the key to a different column in its row. The estimate is the minimum across those four cells β the min takes care of hash collisions, since a colliding key can only inflate a cell, not deflate it. The true count is always at most the minimum (other keys may have inflated some cells, but not all of them simultaneously, probabilistically speaking).
W-TinyLFU: The Window That Fixes Cold-Start
Plain TinyLFU still has the cold-start problem: a brand-new viral item has frequency=0 in the sketch (it hasn't been seen yet), so the admission filter would reject it from cache. W-TinyLFU β the variant used by Caffeine β solves this with a two-region design:
- Window cache (small LRU, ~1% of capacity) β every new item is admitted here unconditionally. This is the incubation zone. New hot items survive here long enough to accumulate frequency in the sketch.
- Main cache (TinyLFU-protected, ~99% of capacity) β split into a small LRU segment ("probation") and a large LRU segment ("protected"). Items graduate from probation to protected on their second access within the main cache.
The admission gate sits between the window and the main cache. When an item is evicted from the window, TinyLFU compares its frequency to the frequency of the candidate eviction victim in the main cache. If the window item has higher estimated frequency, it replaces the main-cache victim. If not, it's discarded. This way, one-hit wonders die in the window (harmlessly), while genuinely popular new items build enough frequency during their window stay to win the admission comparison.
The flow above shows why W-TinyLFU handles both failure modes of plain LFU. New viral items enter the window unconditionally β no cold-start problem. If they're accessed again before the window evicts them, their frequency in the sketch climbs above zero and they can win the admission filter comparison. One-hit wonders die quietly in the window and are discarded by the filter rather than evicting genuinely hot main-cache entries. The 1%/20%/79% split is configurable but is the Caffeine default, tuned from benchmarks across many real workloads.
ARC β The Adaptive Replacement Cache
LRU is a pure recency policy. LFU is a pure frequency policy. Both are rigid: LRU performs badly when a scan pollutes the cache, and LFU performs badly when access patterns shift. What if the cache could observe its own performance and dynamically shift between recency-biased and frequency-biased eviction? That's exactly what ARC does.
The Four-List Design
ARC maintains four lists, which sounds complex but follows a clean logic:
- T1 (recent) β entries that have been accessed exactly once recently. This is the recency arm β behaves like a standard LRU for recent single-access items.
- T2 (frequent) β entries that have been accessed at least twice. This is the frequency arm β a key moves here after its second access and stays here as long as it keeps getting hits.
- B1 (ghost for T1) β a list of ghost entries from T1. When an item is evicted from T1, its key (but not its value) is kept in B1. No memory is consumed for the value.
- B2 (ghost for T2) β ghost entries from T2. Same concept: evicted T2 keys are remembered in B2 without their values.
The total capacity of T1 + T2 = cache size (the real entries). B1 + B2 can grow larger β they're just key lists. A single parameter p controls the balance: p is the target size for T1. When p is large, ARC is biased toward recency (more T1 slots). When p is small, it's biased toward frequency (more T2 slots).
How ARC Adapts β Ghost Hits
Here's the genius: when a cache miss occurs and the missing key is found in B1 (a recent ghost hit), ARC infers that T1 is too small β it's evicting recently-used keys too quickly. So ARC increases p (grows T1, shrinks T2). When a miss occurs and the key is found in B2, ARC infers that T2 is too large β it's evicting frequently-used keys while keeping recency noise. So ARC decreases p (shrinks T1, grows T2). Ghost hits are free feedback signals β they cost no memory for the value, but they tell ARC exactly where it made a mistake.
The diagram shows all four ARC lists at once. The real memory cost is T1 + T2 (the actual cached values). B1 and B2 are metadata-only lists of keys β they track which items were recently evicted, enabling the adaptation signal. When traffic shifts from scan-like to hot-key, B2 ghost hits grow (T2 is too small, hot keys are getting evicted), so p decreases and T2 expands to protect frequency. This self-tuning is why ARC is used in ZFS and IBM DS8000 storage, where workloads vary enormously across different users and time of day.
The adaptation trace shows ARC silently shifting from a 50/50 T1/T2 split to a 20/80 split as the workload transitions from mixed to hot-key. No configuration change, no operator intervention β the ghost-hit signal does all the work. This makes ARC particularly valuable in systems where you can't predict or control the access pattern in advance: storage arrays, database buffer pools, OS page caches.
Random & FIFO β Simple Policies, Surprisingly Resilient
Every engineer's first instinct is to reach for LRU. But there are two policies that look naive on paper and turn out to be surprisingly competitive β or outright optimal β in specific scenarios: Random eviction and FIFO (First In, First Out). Understanding when simplicity beats sophistication is a genuine system design skill.
Random Eviction: The Underrated Contender
Random eviction does exactly what it sounds like: when the cache fills up, pick a key at random and evict it. No tracking of access times, no frequency counters, no data structures beyond the cache itself. The overhead is essentially zero β no bookkeeping on every access, no linked-list pointer updates.
Intuition says random should be terrible. But intuition is wrong for workloads where access locality is weak. Here's why: LRU's advantage over random comes entirely from its ability to exploit temporal locality. When that assumption holds (most workloads), LRU wins comfortably. When it's weak β broadly distributed access with no hot spots β random eviction converges to nearly the same hit ratio as LRU, because LRU's recency signal is no better than a coin flip.
Random eviction genuinely shines in three situations:
- Metadata overhead matters β embedded systems and hardware TLBs use random or pseudo-random eviction because the hardware to track recency order would cost more than the hit-ratio gain.
- Scan resistance by design β random eviction cannot be scan-polluted. A sequential scan writes many new entries, each evicting a random existing entry. The hot working set survives with probability proportional to its size, not to when it was last accessed.
- Fairness across keys β in multi-tenant caches where no key should be privileged over others, random eviction treats all keys equally regardless of access history.
FIFO: Ordered Simplicity
FIFO is insertion-order eviction: the entry that has been in the cache the longest gets evicted first, regardless of how recently or frequently it was accessed. Think of a queue β new entries join at the back, evictions happen at the front. A ring buffer or linked queue implements this with O(1) insert and O(1) evict, no per-access bookkeeping.
FIFO is the right choice when your data has a natural freshness expiry that correlates with insertion time. If your data becomes stale after a fixed window (weather data, stock quotes, rate-limit windows), FIFO naturally evicts the oldest β and therefore most likely stale β entries. It's also extremely predictable: the eviction order is deterministic and auditable, which matters in regulated environments.
FIFO fails when popular items keep getting evicted just because they've been in cache for a long time. A configuration value inserted at startup will be evicted after enough time, even if it's accessed millions of times per minute. For this reason, FIFO is almost always combined with either TTL-based expiry or a second-chance mechanism β the CLOCK algorithm used in Linux's page cache.
Under high temporal locality (a small hot set dominates traffic), LRU's recency tracking gives it a substantial hit-ratio edge over both FIFO and Random. As locality weakens β more unique keys, broader access distribution β all three policies converge toward the same hit ratio. In the convergence zone, Random wins on simplicity: zero per-access bookkeeping, no linked-list updates, no memory overhead beyond the cache entries themselves.
- Hardware or embedded context (no LRU metadata cost)
- Scan resistance matters more than optimizing hit ratio
- Multi-tenant fairness required across keys
- Access pattern is known to be broadly uniform
- Data freshness correlates with insertion time
- TTL-based expiry already handles most eviction
- Deterministic, auditable eviction order is required
- Combined with second-chance (CLOCK algorithm)
Redis maxmemory & The 8 Eviction Policies
Redis is the cache most engineers reach for first, and it exposes eviction policy as a first-class configuration option. The challenge is that Redis ships with eight eviction policies, and the names alone don't tell you when to use which. This section walks through all eight, explains what "approximated LRU" means (and why it matters), and gives you a decision framework.
Setting maxmemory and maxmemory-policy
First, nothing happens until you set a memory limit. By default, Redis has no maxmemory limit β it will consume all available RAM and let the OS invoke the OOM (Out of Memory) killer, which is a production incident waiting to happen. Always set a limit explicitly:
# In redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru
# Or at runtime via redis-cli
redis-cli CONFIG SET maxmemory 2gb
redis-cli CONFIG SET maxmemory-policy allkeys-lfu
Once memory hits the ceiling, every write operation β SET, LPUSH, HSET, and so on β triggers the eviction policy to free space before the write proceeds. The maxmemory line sets the hard ceiling; the maxmemory-policy line chooses the algorithm.
The 8 Policies Explained
The eight policies break down along two axes: scope (allkeys = evict any key; volatile = evict only keys that have a TTL set) and algorithm (lru, lfu, random, ttl). The naming pattern makes them easy to remember once you see the grid:
| Policy | What it evicts | Best for |
|---|---|---|
| noeviction | Nothing β returns an error on writes when full | When data loss is unacceptable; Redis used as a primary store, not a cache |
| allkeys-lru | Least recently used key across all keys | General-purpose cache; hot/warm access distribution; most common default |
| allkeys-lfu | Least frequently used key across all keys | Stable access patterns; long-tail distribution; power-law skew in reads |
| allkeys-random | A random key across all keys | Uniform access patterns; scan-resistance; when simplicity is valued |
| volatile-lru | Least recently used key among keys with TTL set | Mixed store: some persistent keys (no TTL), some cache keys (with TTL) |
| volatile-lfu | Least frequently used key among keys with TTL set | Same mixed-store scenario; frequency-biased eviction among expirable keys |
| volatile-random | Random key among keys with TTL set | Mixed store; uniform cache-key access; simplest volatile option |
| volatile-ttl | Key with the shortest remaining TTL among TTL keys | When keys expiring soonest are least valuable; TTL encodes business priority |
Why Redis's LRU Is "Approximated" β Not Exact
Here's a common interview gotcha: Redis does not maintain a true LRU doubly-linked list across all keys. Instead, when eviction is needed, Redis samples a small set of random keys (default: 5, configurable with maxmemory-samples), compares their last-access timestamps, and evicts the one with the oldest timestamp. This is approximate LRU, and it matters for three reasons:
- O(1) per eviction β True LRU requires a linked-list pointer update on every cache read, which is costly in Redis's single-threaded model. Approximated LRU only runs during eviction events, sampling a fixed number of keys β constant time regardless of database size.
- Close enough in practice β Redis's own benchmarks show that a sample of 10 produces hit ratios nearly identical to true LRU across diverse workloads. The default of 5 is slightly less accurate but still excellent for most use cases.
- LFU is also approximated β Redis implements LFU using a per-key 8-bit counter with a logarithmic increment (Morris counter) that decays over time via the
lfu-decay-timesetting. This approximates true frequency using just 1 extra byte per key in the key's metadata.
The sampling diagram makes the approximation concrete. Redis picks 5 random keys, finds the oldest last-access timestamp (key:B at 120 seconds), and evicts it. There may be keys elsewhere with timestamps older than 120s β approximate LRU accepts this imprecision in exchange for O(1) cost per eviction event. The decision guide on the right is the fastest path to the right policy for a new service.
allkeys-lru. It handles most general caching workloads well. Switch to allkeys-lfu if you observe hot keys getting evicted during bursts of diverse reads β that's a signal your workload has stable frequency patterns LFU can exploit. Monitor keyspace_hits and keyspace_misses in INFO stats output to track hit ratio changes after any policy switch.
Memcached Slab Allocator & Per-Slab LRU
Memcached takes a fundamentally different approach to memory management than Redis, and that difference produces a specific, real-world failure mode that can quietly collapse your cache's effective capacity without any obvious cause. To understand the failure β called slab calcification β you first need to understand why Memcached uses slabs at all.
Why Slab Allocation Exists: Defeating Fragmentation
The problem slab allocation solves is memory fragmentation. A general-purpose allocator (like malloc) storing cache entries of wildly varying sizes β some 64 bytes, some 64 KB β ends up with a Swiss-cheese pattern over time: lots of small gaps that are too small to hold new entries, even though total free space is large.
Memcached sidesteps fragmentation entirely by pre-allocating memory in fixed-size chunks called slabs. At startup, Memcached creates a series of slab classes β typically 40β50 classes growing by a factor of roughly 1.25 from ~96 bytes to ~1 MB. When a new cache entry arrives, Memcached rounds it up to the nearest slab class and allocates one chunk from that class. O(1), zero fragmentation within a class.
Per-Slab LRU: The Isolation That Creates the Problem
Here's the critical design consequence: each slab class has its own independent LRU list. Eviction in Memcached does not choose from the globally least-recently-used entry across all entries. It chooses from the least-recently-used entry within the specific slab class that needs a new chunk. Eviction is local to the class that's full.
When this works well: you have a mix of small user-session objects (fitting in slab class 4) and large HTML fragments (fitting in slab class 18). Each class evicts its own LRU entries independently. Small objects don't compete with large objects for eviction slots. Clean separation.
When this creates a catastrophe: enter slab calcification.
The isolation diagram shows the core problem clearly. Slab Class 3 has free chunks that will never be used by Class 6, because slab pages are not transferable between classes at runtime. Class 6 must evict from its own LRU β even if all its entries are stale and would be freely giveable if global eviction existed. The cache is wasting potentially large amounts of memory on data it will never read again, while the slab class that actually needs space keeps evicting its own genuinely warm entries.
The timeline shows the failure progression and the fix. At T0, slab classes are proportionally filled β everything is healthy. At T1, the workload shifts to only small objects. Class 3 fills up and starts aggressively evicting its own warm entries, while Classes 6 and 9 sit full of stale data from the old workload and refuse to give up their memory. The cache's effective capacity collapses to just the capacity of Class 3. The slab automover (T2) detects the mismatch via per-class eviction rate metrics and reassigns whole slab pages from the idle classes to the overloaded one.
-o slab_reassign,slab_automove on the Memcached command line, available since 1.4.11) and consider whether your object-size distribution has shifted since the server started.
CPU Cache & TLB Eviction β Hardware Lessons at Nanosecond Speed
Before Redis, before Memcached, before any software cache you'll ever configure β there is a tiny piece of hardware inside every CPU that has been solving the eviction problem since the 1960s. The L1, L2, and L3 caches inside your processor are real eviction systems, holding between 32 KB and 64 MB of data, making eviction decisions in single-digit nanoseconds. The eviction algorithm the CPU uses β and crucially, why it doesn't use "proper" LRU β teaches a lesson that applies all the way up to distributed caches.
Why the CPU Can't Afford True LRU
Think about what proper LRU requires: every time you access a cache line, you update a timestamp or move a node to the head of a linked list. In software, that costs a handful of CPU instructions β negligible for Redis operating on millisecond timescales. For a CPU cache operating on nanosecond timescales, those extra instructions would cost more than the cache hit saves. The whole point of L1 cache is to return data in 1β2 ns. If maintaining LRU metadata takes 3 ns, you've paid more than you gained.
So CPU designers invented Pseudo-LRU (PLRU) β an approximation that costs almost nothing. Here's the idea: instead of tracking the exact recency order of every cache line, you track one bit per set that says "recently used the left half or the right half?" It's a binary tree of yes/no bits, and eviction just walks the tree following the "least recently used" direction at each level. Four cache lines need only 3 bits. Sixteen cache lines need only 15 bits. The answer is approximate β you might occasionally evict something slightly hotter than the true LRU victim β but you're wrong by at most one level of the tree, and the decision takes a single clock cycle.
The diagram above shows the cache hierarchy that every program you run lives inside. The key insight: as you move outward from the CPU core, size grows but latency grows with it. Each level needs an eviction algorithm, and at L1 the algorithm must complete in a single clock cycle β which rules out any data structure that requires pointer chasing or sorting.
This diagram shows a 4-way set-associative cache using Pseudo-LRU. The root bit says "left half is less recently used." The left subtree bit says "Way 0 is less recently used than Way 1." So the eviction victim is Way 0 β found in two bit-checks, no pointer chasing, no list reordering. When Way 0 gets replaced and later accessed, both bits flip toward it. The whole algorithm runs in a single clock cycle. The lesson from hardware: approximate beats perfect when the cost of perfect exceeds the benefit. This is exactly why TinyLFU uses a Count-Min Sketch instead of exact counters β the same principle, sixty years later, in software.
TLB Eviction β The Same Problem, One Level Up
The TLB (Translation Lookaside Buffer) is another hardware cache with an eviction problem. It holds virtual-to-physical address mappings. When it fills up, it must evict an entry β and once evicted, the next access to that virtual address requires a full page-table walk, costing dozens of nanoseconds. TLBs typically hold 64 to 1,024 entries (tiny!) and use either Pseudo-LRU or fully-random eviction. Why random? At 64 entries, even Pseudo-LRU has measurable overhead per memory access. Random is truly free β and with only 64 slots, the difference in miss rate between random and LRU is small.
Browser & CDN Cache Eviction β Disk Limits and Edge Nodes
When you open a website, your browser is running its own eviction system on your hard drive. When a CDN like Cloudflare or Fastly serves your request from an edge node, that edge node is running another eviction system in RAM. Neither system works quite like Redis, and the reasons why matter for anyone designing a web-facing architecture.
Browser Cache: You Don't Control the Eviction Policy
Here's the surprise: Cache-Control: max-age=31536000 tells the browser "this asset is fresh for one year" β but it does not guarantee the asset survives in the disk cache for a year. Browsers cap their disk cache at roughly 5β10% of available disk space (Chrome's heuristic is complex but this is the ballpark). When the cache fills, the browser evicts entries based on its own policy β typically a combination of recency, access frequency, and resource size. A large video file might be evicted after two days even with a one-year max-age, simply because the disk is full and the browser decides the video is too expensive to keep.
What Cache-Control actually controls is freshness β whether the browser even bothers to check with the server before using a cached copy. It's a hint about validity, not a guarantee of retention. The eviction decision is always up to the browser's own heuristics.
This is the most common browser caching misconception in engineering: ETag and Last-Modified are revalidation headers, not invalidation. They let the browser ask "has this changed since I last fetched it?" β and if the server says "no" (304 Not Modified), the browser serves the cached copy. They have no power over whether the browser evicts the entry. If disk space is tight and the browser's eviction policy selects your asset, the ETag is meaningless β the asset is gone.
CDN Eviction: The Two-Tier Edge Model
CDN edge nodes sit geographically close to users and cache content in RAM. When RAM fills up, they need to evict β and the strategy differs from a single-node cache because CDNs have a natural second tier to fall back to. Fastly's architecture is a good example of a real production approach: instead of immediately deleting an eviction candidate, the edge node can demote it to a secondary RAM tier or a disk tier, where it survives at lower cost. Only when even the secondary tier is full does the entry truly disappear.
The two-tier model matters because it halves the effective miss rate. When a popular item gets evicted from L1 due to a sudden traffic spike, it lands in L2 rather than disappearing entirely. If traffic returns to that item within minutes, the L2 hit avoids an origin fetch. The practical implication: when you configure a CDN, cache sizes at L1 are more important than absolute cache size β keeping the working set warm in fast RAM matters more than having a huge disk cache that's slow to serve from.
Working Set Sizing & the 80/20 Curve β How Much Cache Do You Actually Need?
Here's the question every team eventually asks: "We're at 75% hit ratio. If we double the cache size, do we get to 90%? Or do we need to triple it?" Without the right tool, you're guessing. With a Miss Ratio Curve (MRC), you can answer that question precisely before spending a dollar on infrastructure.
The Working Set: The Entries That Actually Matter
Not all cached data is equally hot. In most real workloads, access patterns follow a power-law distribution β a small fraction of keys absorb a large fraction of requests. The classic approximation is 80% of requests touch 20% of keys, but the real ratio is often more extreme: 95% of requests touching 5% of keys is common in social feeds and e-commerce. The working set is the set of entries that, if kept warm in cache, will satisfy the vast majority of requests.
Why does this matter for sizing? Because once your cache is large enough to hold the entire working set, adding more cache gives diminishing returns. Going from a cache that holds 50% of the working set to one that holds 100% might move hit ratio from 70% to 95%. Going from 100% to 200% of the working set might move hit ratio from 95% to 96%. The first doubling was worth it. The second doubling was mostly wasted money.
The knee of this curve is exactly where you want to land. To the left of the knee, every extra gigabyte of cache pays off handsomely. To the right, you're paying cloud infrastructure costs for almost no improvement. The question is: where is the knee for your specific workload? This is what the Miss Ratio Curve tells you.
Mattson's Stack Distance Algorithm β One Trace, Every Cache Size
In 1970, Mattson, Gecsei, Slutz, and Traiger at IBM published one of the most elegant results in computer science caching theory. The idea: for each access to a key, count how many other distinct keys were touched since you last saw that key β that count is the "reuse distance" for this access. They called this measurement the stack distance, because conceptually you imagine an infinite LRU stack and the answer is how far down the stack the key currently sits. If stack distance is D, then key K would cause a miss in any LRU cache smaller than D entries, and a hit in any LRU cache of D or more entries.
The elegant payoff: by computing stack distances for the entire trace, you get the miss ratio for every possible cache size at once. You don't need to simulate the cache at size 1 GB, then again at 2 GB, then again at 4 GB. One pass through the trace gives you the full MRC curve. This was revolutionary in 1970 and remains the foundation of MRC analysis today.
An MRC is the decision tool that turns "how much cache should we buy?" from a gut-feel question into a data-driven answer. The curve tells you: at your current cache size, what's the miss ratio? And if you doubled the cache, what would it become? The gap between the LRU and TinyLFU lines on the same curve is the improvement you'd get from upgrading your eviction algorithm without buying any new hardware β often worth investigating before spending money.
Scan Pollution & Sequential Flooding β The Killer Failure Mode
You've already seen the story in Section 2: one analytics job, one nightly scan, cache hit ratio from 95% to 60%, 3 AM incident. Now let's go deep on exactly why LRU is defenceless against this, what each production database does about it, and how to protect your own application caches.
The Anatomy of a Scan Pollution Event
When a sequential scan happens, the entries being read are each accessed exactly once β and then never again. But LRU's entire logic is "most recently used is most likely to be used again." So it places every scanned entry at the head of the eviction queue, treating them as the most valuable entries in the cache. Meanwhile, the genuinely hot entries β the ones your real traffic needs β get pushed to the tail and evicted one by one. By the time the scan finishes, the cache is full of entries with zero future value and empty of everything that had value.
The cascade is silent and fast. Each new scan entry doesn't just waste one cache slot β it actively evicts one hot entry. A scan of 1 million rows through a cache of 100,000 slots evicts the entire working set ten times over. The hot entries that get evicted will be re-requested by real traffic, causing real database load. The scan entries that replaced them will never be requested again.
The Four Production Fixes
This failure mode is well-known enough that every major database and caching system has built in a countermeasure. Here are the four most important ones:
Fix 1 β SLRU's Protected/Probation Split
Segmented LRU (SLRU) divides the cache into two regions: a protected segment (typically 80% of cache) for entries that have been accessed more than once, and a probation segment (20%) for newly admitted entries. New entries start in probation. Only when they get a second access do they graduate to the protected segment. Eviction always takes victims from the probation segment first.
Why does this stop scan pollution? Because scan entries are accessed exactly once β they never graduate to protected. When the cache fills, the probation segment fills with scan entries, but the protected segment keeps all the hot working-set entries safe. The scan entries fight each other for probation slots and eventually evict each other. The hot data is untouched.
Fix 2 β TinyLFU's Admission Filter
TinyLFU doesn't just evict entries β it filters incoming entries. Before admitting a new entry to the main cache, TinyLFU checks whether its estimated frequency (from the Count-Min Sketch) is higher than the estimated frequency of the current eviction candidate. If the newcomer has a lower estimated frequency than the victim, it's rejected outright β the victim stays.
For scan entries, their estimated frequency is 0 or 1 β they've never been seen before. The entry they'd evict has been accessed many times and has a high frequency score. TinyLFU's admission filter rejects the scan entry. The scan produces zero impact on the working set. This is the most effective protection against scan pollution of any production cache algorithm.
Fix 3 β PostgreSQL's Ring Buffer for Sequential Scans
PostgreSQL detected this problem and built the fix directly into its buffer pool manager. When the executor begins a sequential scan on a relation larger than one-quarter of shared_buffers, it switches to a BufferAccessStrategy ring buffer β 256 KB by default for sequential reads β instead of allocating freely from the main buffer pool. The scan recycles its own pages within the small ring, so the bulk of the shared buffer pool stays untouched.
The shared buffer pool (configured via shared_buffers) is completely protected. Other queries' hot pages are not disturbed. The ring buffer is recycled at the end of the scan. This is why PostgreSQL's performance under mixed OLTP + analytics workloads is often better than naive expectations β the buffer ring is a first-class protection mechanism, not an afterthought.
There is no knob to turn here β Postgres handles it automatically whenever it identifies a sequential scan strategy. The key takeaway: if you see sequential scan activity in EXPLAIN ANALYZE, Postgres is already protecting your buffer pool from that scan. The danger is when your application layer is doing the scan (e.g., iterating rows and writing them to Redis) β Postgres protects its own cache, but nothing protects your application cache unless you build that protection yourself.
Fix 4 β MySQL InnoDB's Midpoint Insertion Strategy
MySQL's InnoDB buffer pool uses a clever variant of LRU called midpoint insertion. Instead of inserting newly read pages at the MRU head (which is what standard LRU does), InnoDB inserts them at a configurable point in the middle of the LRU list β by default 37% from the tail. Pages that survive long enough and get re-accessed eventually migrate to the "young" (head) part of the list.
Why does this protect against scan pollution? Pages inserted at the midpoint only survive if they get accessed again before they drift to the tail. Scan pages, accessed once and never again, drift to the tail and get evicted before they can displace the truly hot pages at the head. The hot working set is protected at the front of the list.
The innodb_old_blocks_time parameter is the second line of defence: even if a scan page is accessed twice within one second (e.g., two queries hitting the same row during the scan), it still won't get promoted to the hot region. Only pages that survive in the old region for at least 1 second before their second access get promoted β a threshold that eliminates nearly all scan-era accesses.
SLRU's diagram makes the protection crystal clear: scan entries fill the probation segment and evict each other, while the protected segment stays intact. The only way scan entries can damage the protected segment is if there are so many of them that they fill probation and start overflowing β at which point you have a sizing problem, not just a policy problem.
One-Hit Wonders & The Frequency Filter β The 50% Problem
Scan pollution is dramatic β it's a catastrophic failure that happens fast and is easy to diagnose. The one-hit wonder problem is more insidious: it's a quiet, constant drain on cache efficiency that's present in virtually every cache workload, all day, every day.
What Is a One-Hit Wonder?
A one-hit wonder is a cached entry that gets requested exactly once and then never again. You pay the cache miss cost to fetch it from the database, you pay the memory cost to store it, and then it gets evicted having never served a single cache hit. In many web workloads, around 40β60% of all cache entries fall into this category.
Why is this so common? Think about long-tail content. A news article goes viral for two hours and gets read a million times. The same news site has half a million articles that were published years ago and get one organic search hit per week. Each of those rare articles costs a cache miss every time β but if you cache them on that one hit, you're filling your cache with entries that won't be read again for a week, evicting the genuinely hot entries in the process.
The Two-Miss Math: Why One-Hit Wonders Are Doubly Expensive
Here's the counter-intuitive math. Suppose you have a cache of 100 slots. Your working set is 80 hot entries accessed frequently. The other 20 slots are constantly churned by one-hit wonders β each one-hit wonder enters, occupies a slot for a while, gets evicted to make room for the next one-hit wonder, and so on.
Now consider what happens when a one-hit wonder entry is evicted to make room for a new one. If any of your 80 hot working-set entries happened to be in those 20 churning slots, they just got evicted β and the next access to that hot entry is a miss. So the one-hit wonder cost you not just its own original miss (accessing it from the database the first time), but also the future miss it caused by displacing a hot entry. A single one-hit wonder can produce two database reads: one for itself, one for the entry it evicted. Hence "doubly expensive."
At scale: if 50% of your cache entries are one-hit wonders and your cache has 1 million slots, that's 500,000 slots actively displacing hot data. The effective working-set capacity of your cache is roughly halved.
The diagram captures the core benefit: with TinyLFU's admission filter, a one-hit wonder never makes it into the cache at all. The frequency check asks "is this new entry more likely to be accessed in the future than the current eviction candidate?" For a brand-new entry with zero prior accesses, the answer is almost always "no" β so the hot entry at the eviction candidate position stays put. The admission filter is what allows TinyLFU to achieve near-optimal hit ratios on workloads with heavy one-hit-wonder traffic, typically moving hit ratio from the 65-75% range that LRU achieves up to the 85-93% range.
Measurement β Hit Ratio, Miss Cost, MRC Curves & Belady's Optimal
Everything in this page so far has pointed at one skill: knowing when your cache is working and when it isn't. Without measurement, eviction policy tuning is superstition. With the right metrics, it becomes engineering. This section walks through the metrics that actually matter β and the one theoretical benchmark that tells you how close to perfect your cache actually is.
Hit Ratio: The Vanity Metric
Hit ratio β the fraction of cache lookups that return a hit β is the most commonly reported cache metric and often the least useful one on its own. Here's why: a cache hit for a key whose miss cost is 0.1 ms is worth nothing. A cache hit for a key whose miss cost is 800 ms (because the fallback requires a slow external API call) is worth everything. Two caches can have identical hit ratios but wildly different performance impact depending on what they're caching.
Hit ratio is useful as a trend indicator β if it drops from 92% to 78% over an hour, something changed and needs investigation. But it's not useful as a standalone success metric. "Our hit ratio is 95%" sounds great until you realise 90% of your traffic is hitting a tiny set of trivially cheap keys and the 5% of misses are all for your slowest database queries.
Miss Cost: The Metric That Actually Matters
The metric that actually predicts user experience is miss cost β the latency, compute, or dollar cost incurred when a cache miss forces a fallback to the source of truth. The formula for average request latency is:
Example: hit_ratio=0.92, cache_hit_latency=1ms, miss_cost=50ms
β avg latency = (0.92 Γ 1ms) + (0.08 Γ 50ms) = 0.92ms + 4ms = 4.92ms
Improve hit_ratio to 0.96 (same cache, better eviction policy):
β avg latency = (0.96 Γ 1ms) + (0.04 Γ 50ms) = 0.96ms + 2ms = 2.96ms
A 4-percentage-point improvement in hit ratio cut average latency by 40% β because the miss cost dominates.
This is why measuring P99 miss latency β the 99th percentile of requests that missed the cache β is often more valuable than measuring average hit ratio. If your P99 miss latency is 2 seconds (because it requires a complex join on a cold database), then your eviction policy should be optimised to keep the keys that serve that join result warm, even if those keys represent a tiny fraction of total traffic.
The Miss Ratio Curve: Your Capital Expenditure Tool
We introduced the MRC in Section 15. Here we use it for decision-making. The MRC answers three questions that determine caching infrastructure spending:
- "Is more cache worth it?" β Look at the slope of the MRC at your current cache size. If the slope is steep (miss ratio drops significantly with each additional GB), adding cache has high ROI. If the slope is flat (diminishing returns), you're past the knee and more hardware won't help much.
- "Should I change eviction policy first?" β Plot the MRC for your current policy and for TinyLFU on the same chart. If TinyLFU's curve is significantly lower at your current cache size, switching policies (a software change, free) might get you more hit-ratio improvement than doubling RAM (an infrastructure cost).
- "What's my working set size?" β The MRC knee is the working set. If your current cache is much smaller than the knee, you're memory-constrained. If you're well past the knee, you might be over-provisioned.
Belady's Optimal β The Unreachable Ceiling
In 1966, LΓ‘szlΓ³ BΓ©lΓ‘dy at IBM published a result that defined the theoretical upper bound for any cache eviction policy. His algorithm, now called Belady's MIN algorithm or just "Belady's optimal," is simple: when you must evict an entry, evict the entry whose next access is furthest in the future. If an entry will never be accessed again, evict it first. If an entry will be accessed in 1 second and another in 1 hour, keep the 1-second one.
Belady's algorithm produces the minimum possible miss ratio for any given cache size and access trace. It is provably optimal. There is no algorithm that can do better on the same trace and same cache size β this is mathematically proven, not just empirically observed.
The catch is obvious: Belady's algorithm is a clairvoyant oracle. It requires knowing the future β specifically, when each key will next be accessed. In any real system, you don't know the future. So Belady's optimal is unachievable in practice. It's used as a benchmark: you measure how close your real policy gets.
The contrast above is extreme by design, but it illustrates the real lesson: optimise for miss cost, not hit ratio. If you have one keyspace where misses cost 2,000ms and another where misses cost 0.5ms, your eviction policy should be biased toward keeping the 2,000ms keys warm, even at the cost of slightly lower overall hit ratio.
The Belady curve sits at the theoretical floor β no algorithm, no matter how clever, can have a lower miss ratio for a given cache size on a given workload. The LRU curve sits noticeably above it, especially at intermediate cache sizes where the gap between optimal and LRU is largest. TinyLFU's curve sits within 1β2 percentage points of Belady optimal on most production workloads β this is why it became the industry default. It's the closest any practical, online algorithm has come to matching a clairvoyant oracle.
A Working MRC Pseudocode Sketch
For completeness, here's the core idea of stack-distance computation β the algorithm behind MRC generation:
Walk through the logic: stack is a conceptual "infinite LRU cache" β it holds every key ever seen, ordered by recency. When we see key K, its position in the stack is its stack distance β the number of distinct keys accessed since we last saw K. If K is at position 5, it would be a hit in any LRU cache of size 6 or larger, and a miss in any LRU cache of size 5 or smaller. Counting up all the stack distances for the entire trace gives us the full miss ratio curve in one pass. The bottleneck is finding the position, which naive implementations solve in O(N) time β production implementations use an augmented balanced BST (Fenwick tree) for O(log N) per lookup.
Putting It Together: The Measurement Checklist
- Overall hit ratio β baseline, trend over time, not an absolute target
- Hit ratio per keyspace β split by key prefix. Often one keyspace has 40% hit ratio dragging the whole system down.
- P99 miss latency β the cost of a miss for each keyspace. This tells you which misses to prioritise eliminating.
- MRC curve β even an approximated one. Tells you whether the fix is "more cache" or "better policy" before you spend any money.
Tuning Eviction in Production β A Decision Playbook
Theory is useful in interviews. Production is where you actually pay for bad decisions. This section is the five-step playbook you run when you need to answer the question: "Which eviction policy should I pick for this specific system, and how big should the cache be?" The answer is never "just use LRU" β it always starts with measuring what you actually have.
The top row shows the five steps in order. The bottom table summarises step 4 β matching workload character to the right policy. Steps 1-3 are measurement; step 4 is the decision; step 5 is the feedback loop that keeps the decision honest over time.
Step 1 β Measure Hit Ratio Before Changing Anything
The single most common eviction-tuning mistake is changing the policy before knowing what the current policy is doing. Run INFO stats in Redis and look at keyspace_hits and keyspace_misses. Calculate hit_ratio = hits / (hits + misses). If it's already above 95% and latency is fine, you don't have an eviction problem β you might have a capacity problem, a TTL problem, or no problem at all. Measure first. Diagnose second. Change last.
Step 2 β Sample a Workload Trace and Build an MRC
A Miss Ratio Curve (MRC) tells you what your actual working set looks like. You don't need to run the full MRC algorithm on production traffic (that's expensive). Sample 1% of requests for an hour using a log drain or the Redis MONITOR command (with extreme caution β MONITOR halves Redis throughput). Feed that trace into a tool like the SHARDS algorithm or even a simple Python simulation. The curve will show you whether your working set fits in memory, or whether you're fighting a fundamentally oversized problem.
Step 3 β Estimate the Marginal Cost per GB of Hit Ratio Gain
Not every hit-ratio improvement is worth the RAM it costs. Use the MRC to find the "knee" β the point where adding more cache gives rapidly diminishing returns. Then compute the economics: if adding 1 GB of cache RAM costs some amount per month, and it buys you a 2% hit ratio gain, ask what 2% fewer cache misses saves in DB compute cost and latency SLA. Sometimes the math is clear (a busy DB cluster charges a lot per additional query, so more cache pays for itself quickly). Sometimes you're past the knee and adding 10x the RAM only buys a 0.3% improvement β in which case, optimise something else.
Step 4 β Pick the Policy Matched to Your Workload
Now you can use the MRC data and what you know about your access pattern to pick a policy. Here are the four most common production scenarios:
- Long-tail content (Zipfian distribution, stable hot keys): A small fraction of keys gets the vast majority of hits β product images, top articles, popular user profiles. LFU or TinyLFU is the right choice because frequency data accurately identifies the hot keys and protects them from eviction even if they weren't accessed in the last few seconds.
- Recency-dominated (social feeds, session data, real-time events): The most recently created items are the most valuable, and popularity shifts quickly. Standard LRU or Redis's
volatile-lruhandles this well. The hot set changes constantly, so frequency history becomes stale quickly β recency is a better proxy for future access than historical counts. - Analytics / broad uniform reads: Every key gets accessed roughly once as queries scan across many records. LRU actually performs badly here (every scan entry looks equally "recent"). Random eviction is surprisingly competitive because there's no hot set to protect β you're just keeping a random sample of the data universe.
- Unknown or mixed workloads: If you genuinely don't know the access pattern, or it shifts significantly across the day, use ARC or TinyLFU. Both adapt automatically. TinyLFU in particular has become the default for new general-purpose caches (Caffeine, Ristretto) precisely because it degrades gracefully across workload types without manual tuning.
Step 5 β Monitor Continuously and Adjust
Access patterns drift. A product launch changes what keys are hot. A new reporting feature adds a scan workload you didn't have before. Export keyspace_hits, keyspace_misses, and evicted_keys to your metrics platform (Prometheus, Datadog, CloudWatch β whatever you have). Alert when hit ratio drops more than 5% from baseline. Track eviction rate as a percentage of total requests β if you're evicting more than 5-10% of requests, the cache is too small or the workload has changed. Review the policy choice quarterly if the product changes frequently.
keyspace_hit_ratio metric in your monitoring dashboard today, before you need it. Teams that don't have this metric spend hours during incidents guessing whether the cache is the problem. Teams that do have it diagnose eviction issues in minutes.
Bug Studies β When Eviction Goes Wrong in Production
The best way to understand why eviction policy matters is to see it break. These four incidents follow the shape of real production failures β the exact numbers have been softened, but the failure modes are documented and recur regularly. Read each one like a post-mortem: what was the system expecting, what did the eviction policy actually do, and what would have prevented it.
What Went Wrong
The application used a cache-aside pattern with Redis set to allkeys-lru. The reporting job read millions of rows sequentially β each one a unique key that would never be accessed again by real user traffic. Because the cache-aside layer wrote each result into Redis before discarding it, the LRU eviction saw all those rows as "recently used." By the end of the scan, the rows that real users needed β the hot working set β had been completely evicted to make room for one-time analytical reads.
The diagram shows the eviction damage in two snapshots. Before the scan, the cache is healthy β hot working-set entries fill most of the slots and 95% of requests hit. After the scan finishes, those same slots hold one-time analytical rows that real traffic will never request again, and the hit ratio is more than cut in half. The cache is still "full" but its useful content is gone.
The Fix
evicted_keys counter in Redis. If it spikes at a consistent time of day (when a batch job runs), and the hit ratio drops shortly after, scan pollution is almost certainly the culprit.
What Went Wrong
Memcached allocates memory in fixed-size slab classes. Before the launch, most objects were under 256 bytes, so the 256-byte slab class had accumulated the lion's share of memory. After the launch, new 4 KB objects tried to go into the 4096-byte slab class β which had very few pages allocated to it.
Memcached evicted aggressively from the 4096-byte class even while the 256-byte class sat nearly idle, because slab pages are not transferable between classes at runtime without the automover. The result: new, heavily-accessed 4 KB objects were constantly being evicted because their slab class was starved, while old, stale 256-byte objects happily occupied most of the cache's memory.
stats slabs and stats items in Memcached. Look for slab classes with high evictions and low total_pages relative to hit rates. A healthy cache has evictions spread proportionally to slab usage β a calcified cache has one hot slab class with massive evictions and others nearly empty.
allkeys-lru on a dataset of tens of millions of keys noticed that their observed hit ratio was significantly lower than what an ideal LRU should produce for their access pattern. Profiling showed many recently-accessed hot keys were being evicted while genuinely stale keys remained. The Redis version in use had maxmemory-samples set to the default of 5.
What Went Wrong
Redis LRU is not a true LRU. When it needs to evict, it samples a small number of random keys (5 by default) and evicts the one with the oldest LRU clock value. With only 5 samples across tens of millions of keys, the probability of picking a genuinely old key is low. The algorithm frequently evicts keys that just happened to have an older timestamp in the random sample, even if much older keys exist elsewhere. The result looks like LRU to the configuration, but behaves much closer to Random eviction at scale.
maxmemory-samples to 10, or switch to allkeys-lfu (added in Redis 4.0), which uses a frequency counter that doesn't require sampling at eviction time.
What Went Wrong
The cache was warmed at startup (or after a periodic flush) and all keys were written to Redis at roughly the same time, all with the same TTL of 300 seconds. When the TTL expired 5 minutes later, all keys expired simultaneously. Every incoming request found a cold cache. Every one of those requests went to the database at the same moment. This pattern is called a cache stampede (or thundering herd), and it's caused by synchronised TTL expiry β not eviction exactly, but it interacts with eviction policy because the eviction algorithm doesn't protect you from coordinated expiry.
Common Misconceptions β What Engineers Get Wrong
Eviction is one of those topics where a little knowledge confidently leads to bad decisions. Here are the six most common mental models that sound right until you test them in production.
The myth: When you set maxmemory-policy allkeys-lru in Redis, Redis maintains a true least-recently-used ordering of all keys and always evicts the single least-recently-used key.
The reality: Redis LRU is sampled. When Redis needs to evict, it picks a small random sample of keys (5 by default, tunable via maxmemory-samples) and evicts the one with the oldest lru_clock timestamp among that sample. It does not inspect all keys. This is a deliberate engineering trade-off: maintaining a true LRU doubly-linked list across every single access would add overhead to every read and write operation. The sampled approach keeps the eviction path fast but introduces approximation β especially on very large key spaces where the probability of sampling the globally-oldest key in 5 tries is low.
Why it matters: If you're relying on "LRU will definitely keep my recently-used hot keys" as a correctness guarantee, you need to understand it's a probabilistic approximation. The larger the key space and the smaller the sample, the further the actual behaviour drifts from true LRU. For large key spaces where accuracy matters more than approximation speed, use allkeys-lfu or increase maxmemory-samples to 10.
The myth: The goal of cache tuning is to maximise hit ratio. A 99% hit ratio is always better than a 95% hit ratio, so always add more cache until you hit 99%+.
The reality: Hit ratio improvements are subject to diminishing returns, and the value of a hit depends on the cost of a miss. Going from 80% to 90% hit ratio might require doubling the cache (the working set is large, you're near the knee of the MRC). Going from 98% to 99% might require 5x more memory because those last few percent of keys are accessed rarely but are still in the working set. The math you need to do: what does one DB miss cost in latency, compute, and money? If a DB read costs very little (fast read replica, cheap compute), a 95% hit ratio is probably fine. If DB reads are expensive (cross-region, heavy joins, shared DB with little headroom), 99% might be worth the extra RAM.
Why it matters: Blindly optimising hit ratio leads to over-provisioning cache RAM. Always compute the marginal benefit per GB added and compare to the marginal cost of that GB. The MRC gives you the inflection point β the knee beyond which additional cache investment yields rapidly diminishing returns.
The myth: LFU is objectively superior because it uses more information (how often a key was accessed, not just when it was last accessed). More data = better decisions. Always use LFU.
The reality: LFU has a well-known failure mode called the stale popularity problem (also called the "frequency decay" problem). A key that was extremely popular six months ago has accumulated a huge frequency count. If the content it represents is now irrelevant (old news article, retired product, last season's campaign page), LFU will protect it from eviction almost indefinitely β it has a high frequency count and will always win over newer, potentially more relevant keys with low counts. Meanwhile, a genuinely hot new key that launched today has a low frequency count and is at high risk of early eviction.
Why it matters: Plain LFU without frequency decay is inappropriate for workloads where popularity is time-sensitive. This is why TinyLFU uses a frequency sketch with periodic aging β it halves all frequency counts periodically so old popularity fades and new popularity can gain traction. If you use LFU in Redis, it has built-in decay via the lfu-decay-time config parameter (default: 1 minute). Set this to match how quickly popularity changes in your workload.
The myth: Random eviction is a lazy fallback with no intelligence. LRU must always beat it because LRU at least tracks recency β Random throws away a key that might be very hot.
The reality: For workloads where the access distribution is uniform (or close to uniform), random eviction performs nearly as well as LRU β and sometimes better. Here's the intuition: if every key has a roughly equal probability of being accessed next, then the probability of a randomly-chosen eviction victim being the next-needed key is about 1/(cache size). That's the same probability as any other eviction strategy, because no strategy can predict which of N equally-likely keys will be accessed next. The benefit of Random is that it carries zero metadata overhead β no linked list, no access timestamps, no frequency counters. This makes it extremely fast and cache-friendly. For analytics workloads or batch processing where access distribution is broad and uniform, Random is often the right choice.
Why it matters: Don't dismiss Random as "not serious." Several production databases and CDNs use random eviction because the workload is broad enough that recency and frequency metadata provides minimal benefit over the cost of maintaining it.
The myth: Setting a TTL on a cache key is a form of eviction. Expired keys are evicted by the cache.
The reality: Expiry (TTL) and eviction are two completely separate mechanisms with different triggers and different purposes. Expiry is invalidation β a key expires when it becomes stale (data freshness concern). Eviction is capacity management β a key is evicted when the cache is full and needs space (memory pressure concern). A key can be evicted before its TTL expires (if memory is under pressure). A key can persist well past its nominal TTL expiry if memory is not under pressure and the cache backend implements lazy expiry (Redis does this β expired keys are cleaned up on next access or by a background sweep, not immediately on the TTL tick).
Why it matters: If you're relying on TTLs to keep memory usage bounded ("all keys expire in 10 minutes, so memory can't fill up"), you're wrong. Between bulk writes and TTL expiry, memory can absolutely fill up, triggering eviction of entries that haven't yet expired. Set maxmemory in Redis to enforce a hard cap, and treat eviction policy and TTL as two independent, complementary tools β not the same tool.
The myth: Cache is fast, databases are slow, therefore more cache is always better. When in doubt, provision more Redis RAM.
The reality: The MRC curve has a characteristic shape β steep improvement at first as the most-popular items fit into cache, then a knee where improvement slows dramatically, then near-flat tail where adding large amounts of cache buys almost no additional hit ratio improvement. Past the knee, you're caching items that will be accessed so rarely that even if you cache them perfectly, the hit ratio won't meaningfully improve. Beyond the MRC argument, there are practical costs: more Redis RAM means bigger RDB snapshots, longer AOF replay times on restart, more memory pressure on the host, higher cloud costs, and more replication traffic if you run a replica. The right model is: cache the hot working set, measure where the MRC knee is, and stop there. Use the saved budget on database read replicas or query optimisation for the long-tail traffic that isn't worth caching.
Practice Exercises β Build Your Intuition
Understanding eviction policy on paper is one thing. Building intuition β the kind that fires instantly when you're staring at a production incident or a system design whiteboard β requires actually working through problems. These five exercises range from concrete implementation to open-ended design, covering the skills an interviewer would probe.
Implement an LRU cache that supports get(key) and put(key, value) in O(1) time. The cache has a fixed capacity and evicts the least recently used entry when full.
Constraints: O(1) for both operations. No language-specific library (no Java LinkedHashMap shortcuts).
You need two data structures: one for O(1) key lookup, and one for O(1) ordering. A HashMap gives you O(1) lookup. A doubly-linked list gives you O(1) insertion, deletion, and move-to-front β because you can splice any node in O(1) if you have a direct pointer to it. The HashMap stores pointers to list nodes, not the values directly. On every get, move the accessed node to the head of the list. On put when full, remove the tail (LRU entry).
Use sentinel (dummy) head and tail nodes to avoid null checks. The real data nodes live between them. The head's next is the most recently used; the tail's prev is the least recently used.
You are given an access trace of 1 million requests against a dataset of 100,000 unique keys. You simulate LRU caches of sizes 1K, 5K, 10K, 20K, 40K, 60K, 80K, and 100K keys and measure the following hit ratios: 31%, 58%, 73%, 85%, 92%, 94%, 95.5%, 96%.
Questions: (a) Where is the MRC knee? (b) If cache RAM costs X per 1000 keys and a DB miss costs 5X in compute, what is the economically optimal cache size? (c) If a new product launch doubles the working set size from 20K to 40K unique hot keys, what happens to your hit ratio at the current cache size of 20K?
The knee is where the marginal gain per additional cache unit drops sharply. Look at the incremental hit ratio gain per 1K keys added at each step. The steepest drop in marginal gain marks the knee. For part (b), compute: "revenue" of a 1% hit ratio increase (= 10K misses/million avoided Γ 5X cost per miss) vs. "cost" of the cache size jump needed for that 1% gain.
(a) The knee is between 20K and 40K keys. Going from 1K to 20K: +54 percentage points over 19K keys = ~2.8 pp/1K. Going from 20K to 40K: +7 pp over 20K = 0.35 pp/1K. Going from 40K onward: <0.15 pp/1K. The marginal gain drops roughly 8Γ between the pre-knee and post-knee regions β that's the inflection point at around 20-25K keys.
(b) Optimal size at 20K. At 20K keys, hit ratio is 85%. Adding 20K more (to 40K) buys +7 pp. That 7 pp prevents 70K misses per million requests. Cost of those 70K misses = 70K Γ 5X = 350,000X compute units saved. Cost of 20K more cache = 20 Γ X = 20X RAM cost. The ROI is 350,000/20 = 17,500Γ. Clearly worth it. Adding another 20K (to 60K) buys only +2 pp = 20K misses avoided = 100,000X saved, at 20X cost β still positive (5,000Γ ROI) but declining fast. At 80K the ROI drops to around 750Γ. The exact economic knee depends on your specific cost ratios, but the principle β check ROI, not raw hit ratio β is the answer.
(c) A doubled working set to 40K hot keys at 20K cache size means only half the working set fits in cache. Hit ratio will collapse from 85% toward the hit ratio of a 10K cache on the original 20K working set β roughly equivalent to a 50% cache-to-working-set ratio. From the MRC above, that's somewhere around 58-73% hit ratio, depending on the Zipfian distribution of the new keys. This is the "cache is now half the working set" scenario β the MRC analysis is mandatory after any product launch that significantly changes key space size.
You are designing the caching layer for three independent services. For each one, pick the Redis maxmemory-policy and justify why.
Service A β Session store: Stores user session tokens. Sessions are created on login, accessed frequently during active use, and should automatically expire when the user is inactive. TTLs are set on all keys. Memory must stay bounded.
Service B β Product catalogue: Stores enriched product metadata for an e-commerce site. A small fraction of products (popular / trending items) get the vast majority of views. Product data changes occasionally via editorial updates. No TTLs on keys β invalidation is done explicitly.
Service C β Feature flag store: Stores feature flag evaluations (key = flag name + user segment, value = true/false). There are many flags and segments, but flag data changes infrequently. All lookups should return a value; a cache miss is acceptable but should be rare. Memory is tightly constrained.
Think about what signal matters for each use case. Sessions: TTL is the natural eviction driver β prefer evicting expired entries first. Product catalogue: no TTLs, access is Zipfian (popularity matters more than recency). Feature flags: small working set relative to memory, uniform importance, no natural ordering.
Service A β volatile-lru: All keys have TTLs (session expiry). The right policy is one that evicts keys that have TTLs, prioritising the least recently used. volatile-lru does exactly this β it only evicts keys with TTLs, and among those, picks the LRU. This respects session expiry intent while managing memory pressure. Do NOT use allkeys-lru β you don't want to evict sessions that are still actively used just because a more recent session displaced them.
Service B β allkeys-lfu: No TTLs, Zipfian access distribution, stable popularity for popular products. LFU is the right choice because frequency is a stable and accurate proxy for future access likelihood in e-commerce product catalogues. TinyLFU (available via Caffeine or Ristretto if you front Redis) would be even better, but in Redis 4.0+, allkeys-lfu with a sensible lfu-decay-time is a solid approximation.
Service C β allkeys-lru (or allkeys-random): Feature flag evaluations have no TTLs, the access distribution is relatively uniform across flags and segments (many flags, accessed fairly evenly), and the working set is small relative to expected memory. LRU is a safe default. If memory constraints are severe and you need zero metadata overhead, Random is also defensible here because the uniform access distribution means recency is a weak signal anyway.
You're looking at a 24-hour hit ratio graph for a Redis cache. The graph shows a stable plateau at 91% from midnight to 2 AM, then a sharp drop to 54% at 2:15 AM, then a gradual climb back to 88% over the following 3 hours, then another plateau. This pattern repeats identically the following day. There are no traffic spikes in the incoming request rate. evicted_keys shows a spike at 2:15 AM matching the hit ratio drop.
Questions: (a) What is the most likely cause? (b) How do you confirm it? (c) What are three ways to fix it?
The pattern is too regular to be organic user traffic. The exact time (2:15 AM) and the spike in evictions both point to a scheduled background job. What kind of job runs at 2 AM that would write many new keys into a cache?
(a) Scan pollution from a scheduled batch job. A nightly job (reporting, ETL, full-table cache warm-up for a new feature) ran at 2:15 AM, wrote a large number of keys into the cache, and evicted the hot working set. LRU treated the batch keys as "recently used" because they were just written, even though they will never be accessed again by the production workload. The 3-hour recovery time is the warm-up time for the working set to re-populate via normal traffic.
(b) Confirm by: checking application logs or cron job schedules for a job running at 2:15 AM; looking at Redis INFO keyspace to see if the key count spikes at that time; checking the rate of new keys written per second using redis-cli --latency-history or monitoring; and, if possible, examining whether the evicted keys are from a specific key pattern (batch job keys often have a distinct prefix).
(c) Three fixes: (1) Route the batch job to a separate Redis instance (or a read replica/separate DB) so it doesn't share a cache with the production workload. (2) Modify the batch job to skip the cache-aside write β if each row is only accessed once, there's no reason to cache it. (3) Switch the production cache policy to TinyLFU β the admission filter will reject one-hit-wonder keys (the batch job writes) from entering the cache at all, protecting the hot working set.
You are building a caching layer for a social media feed. The access pattern is highly Zipfian: roughly the top 10% of posts by popularity get about 85% of requests. However, there is a strong "novelty bias" β new posts receive a burst of traffic immediately after posting, then their popularity stabilises or decays. Additionally, many posts are accessed exactly once (users open a post, never return to it). LRU is failing because new posts look "recently used" even when they will only ever be accessed once.
Design task: Design an admission filter for this cache that allows a post to enter the cache only when there is evidence it will be accessed more than once. Specify: (a) the data structure for the filter, (b) the admission rule, (c) how the filter handles memory, and (d) how you handle the novelty-bias problem (new posts that will become popular but start with low frequency counts).
TinyLFU is the canonical solution to exactly this problem. The frequency sketch (Count-Min Sketch) tracks approximate access counts. The admission rule is: only admit a new entry to the main cache if its frequency count exceeds the eviction candidate's count. Think about how you'd add a "new entry grace period" for the novelty-bias part.
(a) Data structure: a Count-Min Sketch (or equivalent frequency sketch). This is a probabilistic data structure using a 2D array of counters and multiple hash functions. It can approximate the access frequency of any key using very little memory β a sketch tracking billions of events can fit in tens of megabytes. Use it to count how many times each post ID has been requested, before it's admitted to the main LRU cache.
(b) Admission rule: When a new post needs to be cached and the cache is at capacity, compare the new post's frequency count (from the sketch) to the current eviction candidate's frequency count. Only evict the candidate and admit the new entry if freq(new) >= freq(candidate). If the new post has only been seen once, it won't pass this test unless the eviction candidate is also a near-zero-frequency entry.
(c) Memory management for the filter: The sketch has a fixed size (set at initialisation). To prevent stale counts from old posts dominating, periodically "age" all counters β halve every counter every N minutes (or every N access events). This is the same decay mechanism TinyLFU uses. The aging step ensures old popularity fades and new popular posts can accumulate counts quickly.
(d) Novelty bias β the W-TinyLFU approach: Maintain a small "window LRU" cache (perhaps 1% of total capacity) that admits all new entries without filtering. New posts enter the window cache immediately, which serves as the novelty buffer. After accumulating some access history in the window, entries are promoted to the main cache only if their frequency count qualifies them. If the window overflows, entries are evaluated against the main cache's eviction candidate. This is exactly the W-TinyLFU (Window-TinyLFU) design used by the Caffeine library β the small window absorbs novelty bursts while the main cache is protected by the frequency admission gate.
Cheat Sheet & Glossary β The 30-Second Recap
Everything in this page, compressed to flashcard size. Use this for pre-interview review or as a quick reference when configuring a new cache.
Policy Quick-Reference
Key Production Rules
maxmemory-samples to 10 for large key spaces. Or switch to allkeys-lfu.
evicted_keys / total requests. Alert if >5-10%. A rising eviction rate means the working set no longer fits β cache is too small or workload has grown.
Glossary
- Working set
- The collection of cache entries that are actively needed to serve the current workload. If the cache is large enough to hold the working set, hit ratio is high. If the working set exceeds the cache, hit ratio is limited by how well the eviction policy approximates Belady's OPT.
- Miss ratio curve (MRC)
- A plot of miss ratio vs. cache size for a given access trace. Shows exactly how much hit ratio you gain per additional unit of cache. The knee of the curve marks the economically optimal cache size.
- Scan pollution
- A failure mode where a sequential scan (full table scan, batch export, nightly analytics) writes a large number of one-time keys into the cache, evicting the hot working set. The LRU policy treats scan entries as "recently used" even though they will never be accessed again.
- Ghost entry (ghost list)
- In ARC, a ghost entry is the key (without its value) of a recently evicted item. Ghost lists track what was evicted so the algorithm can learn: if a ghost entry is accessed, the miss could have been prevented, and the policy shifts weight toward the queue that would have retained it.
- Admission filter
- A gate that decides whether a new item is worth admitting to the main cache at all. TinyLFU uses a frequency sketch as its admission filter: a new item is only admitted if its estimated frequency exceeds the eviction candidate's frequency. Prevents one-hit wonders from polluting the cache.
- Frequency sketch
- A probabilistic data structure (typically a Count-Min Sketch) that approximates how many times a key has been accessed, using a small fixed amount of memory. Used by TinyLFU and similar algorithms to approximate LFU frequency counters without the per-key memory cost of exact counters.
- One-hit wonder
- A cache entry that is accessed exactly once and never again. Caching one-hit wonders wastes capacity and causes pollution. CDN and feed workloads have very high one-hit-wonder rates. TinyLFU's admission filter is specifically designed to reject these entries before they enter the main cache.
- Cache stampede (thundering herd)
- When many cached keys expire simultaneously, causing a flood of concurrent cache misses that all hit the origin (database or upstream service) at the same time. Caused by uniform TTLs on bulk writes. Prevented by TTL jitter and probabilistic early refresh.
- LRU clock (Redis)
- A 24-bit timestamp stored per key in Redis, recording the last-access time relative to the server's internal LRU clock. Updated on access. Used during sampled LRU eviction to find the least-recently-used key among a random sample. Cheaper than a doubly-linked list but less accurate.
- Slab class (Memcached)
- Memcached divides memory into fixed-size slab classes, each serving a range of object sizes. Eviction is per-slab-class: a full slab evicts from within that class only, and cannot borrow free space from other classes. Object size changes can cause slab imbalance ("slab calcification").
- Temporal locality
- The tendency for recently-accessed data to be accessed again soon. LRU exploits temporal locality. Workloads with strong temporal locality are well-served by LRU; workloads with weak temporal locality (sequential scans) are not.
- Frequency locality
- The tendency for a small subset of data to be accessed far more frequently than the rest (Zipfian / power-law distribution). LFU and TinyLFU exploit frequency locality. Most real-world workloads have strong frequency locality β a small fraction of keys accounts for the vast majority of requests.