CDN — System Guide

Section 1

TL;DR — CDNs in Plain English

Why geography is a hard physics constraint — the speed of light in fiber means a Sydney user hitting a Virginia server waits 160 ms before a single byte arrives, and how a CDN cuts that to ~10 ms
How Anycast routing and GeoDNS routing each steer a user to the nearest edge node — Cloudflare and Fastly rely heavily on Anycast, while Akamai historically used GeoDNS (and most providers now mix both)
What a CDN cache key actually is, why a single rogue query parameter can crash your hit ratio from 90% to 5%, and how to fix it
The three jobs a CDN performs (static caching, dynamic acceleration, edge compute) and the right tool for each
How CDN pricing models differ — flat bandwidth vs per-request vs committed use — and why the wrong choice can be 10× more expensive

A CDN is a planet-scale distributed cache — thousands of servers placed physically close to users in every major city. Instead of every request traveling across the globe to your one origin server, it bounces off the nearest edge node. Latency drops from 150 ms to 10 ms. Your origin server handles 1% of the traffic it used to. But the magic is entirely in the details: how routing finds the nearest node, what the cache key includes, when to purge vs let TTLs expire, and where edge compute fits in.

A Content Delivery Network (CDN) is a globally distributed set of servers — called edge nodes or Points of Presence (PoPs) — that cache copies of your content close to the users who request it. When a user in Tokyo loads your website, instead of that request traveling all the way to your server in Virginia (150 ms round trip, minimum), it hits a CDN edge node in Tokyo (10 ms round trip). The CDN stores a cached copy of your HTML, images, JavaScript, fonts, and API responses; future requests for the same resource are served entirely from the edge without touching your origin. The result: dramatically lower latency for users everywhere, and your origin server handling only a fraction — sometimes under 1% — of the raw traffic.

CDNs do three distinct jobs, each with different configuration implications. Static asset caching is the classic use: immutable files (JS bundles named with a content hash, images, fonts) cached at the edge for days or weeks with zero invalidation concern. Dynamic acceleration caches personalized or frequently-changing responses (product pages, search results, API JSON) with short TTLs or surrogate-key invalidation — the CDN reduces latency even for content it can't cache long. Edge compute (Cloudflare Workers, Lambda@Edge, Fastly Compute) runs actual code at the edge — A/B testing logic, authentication, request rewriting — eliminating an entire origin round trip for logic that doesn't need the database. The dangerous mistake is treating all three as the same knob and applying the same TTL to everything.

A CDN is not a black box you enable and forget. Cache keys determine what counts as a unique cached resource — get them wrong and your hit ratio collapses (a single UTM tracking parameter like `?gclid=...` creates a fresh cache entry per ad click). Routing determines how a user's request finds the nearest edge — Anycast (used by Cloudflare, Fastly) lets BGP do the work; GeoDNS (used historically by Akamai) gives more control but is slower to fail over. Pricing differs sharply: Cloudflare charges mostly on bandwidth; Akamai charges heavily per request; AWS CloudFront charges per GB plus per 10,000 HTTPS requests. And invalidation — changing a cached resource before its TTL — requires either a purge API call, surrogate keys, or cache-busting URLs. Each strategy has a cost and a latency implication. This page is the map.

A CDN places cached copies of your content on edge nodes near users, cutting latency from 150 ms to ~10 ms and offloading 90–99% of traffic from your origin. The three CDN jobs — static caching, dynamic acceleration, edge compute — each need different configuration. Cache keys, routing (Anycast vs GeoDNS), invalidation strategy, and pricing model are the four variables that determine whether a CDN deployment succeeds or quietly drains your budget.

Section 2

Why You Need This — The Speed-of-Light Tax

Most engineers know CDNs make websites faster. Fewer can explain why in physics terms — and that understanding matters, because it tells you exactly how much faster a CDN can make your site (and where its limits are). The answer starts with the most unforgiving constant in nature: the speed of light.

The Physics Problem You Can't Optimize Away

Light travels at roughly 300,000 km/s in a vacuum. In fiber-optic cable, photons travel at about two-thirds of that — around 200,000 km/s — because glass slows them down (the refractive index of glass is ~1.5, so light takes 1.5× longer to traverse the same distance). That sounds fast, but the internet is big. The straight-line distance from New York to Sydney is about 16,000 km. The actual fiber path (which follows undersea cables, not straight lines) is longer — closer to 18,000–20,000 km. At 200,000 km/s, a single one-way trip takes:

    one-way propagation delay = 16,000 km ÷ 200,000 km/s = 80 ms

    round-trip = 80 ms × 2 = 160 ms (just for photons — before any processing)

That 160 ms is the floor. No amount of server optimization can push it below that — it's physics. And that's the best case. Real network paths add more latency from routing hops, queueing, and handshakes. Now consider what a web browser actually does when it loads a page:

DNS lookup — ~50 ms (can be cached, but often isn't on first visit)
TCP connection — 1 round trip = 160 ms (SYN → SYN-ACK → ACK)
TLS 1.3 handshake — 1 additional round trip = 160 ms (TLS 1.2 was 2 round trips = 320 ms)
HTTP request + first byte — 1 more round trip = 160 ms

Before the user sees a single byte of your page, the Sydney browser has burned 160 + 160 + 160 = 480 ms in round trips alone — with TLS 1.3. With TLS 1.2, it was worse. On mobile with higher base latency, worse still. The content download comes after all of that. A 200 KB HTML page at 10 Mbps takes another 160 ms to transfer. Total: easily 700–1,500 ms just to render the first screenful.

The Startup Story: Sydney Users Stop Coming Back

Here's how this plays out in practice. A startup launches its SaaS product with servers in US-East (Virginia). The founding team is in New York — pages load in 200 ms, feels snappy. They get their first customers in Australia via a product hunt launch. Those users experience 1.4-second page loads. The landing page bounce rate from Sydney is 68% vs 22% for US users. The team notices after three weeks when someone checks the analytics by country. They've lost hundreds of potential signups to a physics problem they didn't know existed.

A CDN with a Sydney edge node changes the math completely. The user's request hits the Sydney PoP — 10 ms away instead of 16,000 km. The TCP + TLS handshakes happen locally (10 ms × 3 round trips = 30 ms). The cached HTML is served in milliseconds. Total time to first byte: ~50 ms. Page load for Sydney users drops from 1.4 seconds to under 200 ms. Bounce rate normalizes.

The diagram makes it concrete. Without a CDN every user — regardless of where they live — waits for the full round trip to the origin. With a CDN, users hit the nearest edge and physics works in their favor instead of against them. The origin still exists and still handles cache misses (the first time any given resource is requested at an edge) but that's a rare event compared to the storm of cache hits.

The RTT Decomposition: Where Every Millisecond Goes

It's worth being precise about what "latency" means when a browser loads a page over HTTPS. Every component adds time, and most of them require round trips — which means every component is taxed by the propagation delay twice (out and back). Here's the breakdown for a first visit, NYC to Sydney, no CDN:

The key insight in that diagram: three of the five phases (TCP, TLS, HTTP request) are each a full round trip. Every round trip burns the propagation delay twice. A CDN doesn't speed up your server or your code — it shortens the round trip by moving the endpoint physically closer. That's why the CDN's impact is multiplicative: it cuts every round-trip-dependent cost by the ratio (edge RTT / origin RTT), not just one of them.

The rule of thumb: A CDN can only help with latency introduced by network distance. It cannot help with slow database queries, CPU-heavy rendering, or large uncompressed files. The 10–20× improvement is real — but only for the physics component. Optimize your server-side performance separately.

Latency from geographic distance is a physics constraint, not an optimization problem. NYC-to-Sydney propagation alone is 160 ms round trip; with TCP + TLS + HTTP, a first visit takes 600–1,500 ms before the user sees anything. A CDN solves this by placing an edge node 10–50 ms from the user, cutting every round trip from 160 ms to under 10 ms — a 10–20× improvement for globally distributed users.

Section 3

Mental Model — The Three-Layer Cache Pyramid

To reason about CDNs clearly, you need one mental model that explains the entire architecture. Here it is: a CDN is a three-layer cache pyramid. Data lives at the bottom in your origin, propagates up through an optional middle tier (the origin shield), and sits at the top in hundreds of edge PoPs distributed worldwide. Users at the very top request content; the request flows down the pyramid until it hits a cache layer that has the content, then flows back up.

Layer 1 — The Origin (Bottom)

The origin server is your application — an EC2 instance, an S3 bucket, a Kubernetes service. It's the single source of truth: when the CDN doesn't have a cached copy of something, it goes here. The goal of a CDN is to make origin requests rare. In a well-tuned CDN deployment, the origin handles under 5% of total traffic — the other 95%+ is served from edge caches. This matters because your origin is expensive: it runs your application code, queries your database, and costs money proportional to traffic. CDN origin offload directly reduces compute costs.

Layer 2 — The Origin Shield (Middle, Optional)

The origin shield is where things get clever. Imagine your CDN has 300 edge PoPs worldwide. If a popular new blog post is published, every edge PoP that gets a first request for that post will simultaneously send a cache-miss request to your origin. That's a thundering herd — 300 origin requests arriving at once. The origin shield is a single designated CDN PoP that sits between all the edge PoPs and your origin. Every cache miss from every edge PoP routes through the shield first. The shield checks its own cache — if it has the content, it serves all 300 edge PoPs from there (1 origin request total). Only if the shield misses does the request reach your origin. Cloudflare calls this "Tiered Cache", Fastly calls it "shielding", AWS CloudFront calls it "Origin Shield." Same idea everywhere.

Layer 3 — Edge PoPs (Top)

The edge PoPs are the part of the CDN closest to users. Each PoP is a data center with dozens or hundreds of servers running the CDN software. These servers cache content, terminate TLS connections, run edge compute (Cloudflare Workers, Lambda@Edge), and handle the volume. A large CDN like Cloudflare operates in 300+ cities worldwide; a smaller enterprise CDN might have 50–80 locations. The PoP nearest a user depends on routing — which is covered in S5. Users never connect directly to the origin; they connect to the nearest PoP, and the PoP deals with everything else.

Reading the pyramid from top to bottom: a user request first hits the nearest edge PoP. If the PoP has a cached copy (a cache hit), it responds immediately — 5–50 ms. If not (a cache miss), the request routes to the origin shield. The shield checks its cache — if it has the content, all the edge PoPs that missed can be filled from here without ever reaching the origin. Only if the shield misses too does the request reach the actual origin server. In a well-tuned deployment, the shield absorbs most of the misses and the origin sees a tiny fraction of total traffic.

Why the Shield Matters: The Thundering Herd Problem

Without an origin shield, cache misses can create a "thundering herd" — a sudden wave of requests all hitting the origin simultaneously. Imagine you publish a new blog post. Within seconds, CDN edge PoPs in 50 cities each get their first request for it. Each PoP misses its cache. Each one independently asks your origin for the page. Your origin gets 50 simultaneous requests for the same URL, all arriving within 100 ms of each other. If your origin can handle 20 req/s, you've just pushed it to 500 req/s for a second. With a shield, those 50 edge misses all route through the shield, which makes a single request to the origin. Origin load: 1 request. The shield pattern is especially important for video content (a viral video can trigger millions of edge misses per second) and for systems with limited origin capacity.

Enable origin shielding. Every major CDN (Cloudflare Tiered Cache, CloudFront Origin Shield, Fastly Shielding) offers this as a configurable option. It's almost always worth enabling. The cost is a small increase in shield-to-origin RTT for cache misses; the benefit is massive origin protection and often a significant improvement in global cache hit ratio (since the shield's cache is warmer than any single edge PoP's).

A CDN is a three-layer pyramid: origin (source of truth) → origin shield (optional middle tier that aggregates edge misses) → edge PoPs (hundreds of locations near users). Users hit the edge layer and get sub-50 ms responses on cache hits. Cache misses flow down to the shield, which protects the origin from thundering herd by collapsing many simultaneous edge misses into a single origin fetch. Understanding this pyramid is the foundation for every other CDN decision.

Section 4

Core Concepts — The CDN Vocabulary

CDN documentation is full of terms that sound similar but mean very different things. Getting these wrong leads to misconfigured deployments. This section pins each term to a plain-English idea before adding the jargon label — read it once and the rest of the page will be clear.

Cache Mechanics

Cache key — When the CDN receives a request, it needs to check "do I already have a copy of this?" The cache key is the lookup string. By default it's the full URL — scheme + host + path + query string. Two requests with the same cache key are considered identical and the CDN serves the same cached response to both. Two requests that differ in even one query parameter get separate cache entries.

Cache hit vs cache miss — A cache hit is when the CDN has a cached response and can serve it without calling the origin. A cache miss is when it doesn't and must fetch from origin. The ratio of hits to total requests is the cache hit ratio (also called hit rate). A hit ratio of 90% means 90% of requests are served from cache without touching your origin.

TTL (Time-To-Live) — Every cached entry has a TTL, set by the Cache-Control: max-age=N response header. After N seconds, the cached copy is stale and must be re-fetched. A TTL of 86400 means entries are cached for 24 hours. A TTL of 0 means "don't cache at all."

Purge — Purge is the act of immediately deleting a cached entry before its TTL expires. When you deploy new content, you purge the old cached copies so the next request fetches the fresh version. Most CDNs expose a purge API; some charge per purge request.

Cache-fill — When a cache miss occurs, the CDN fetches the content from the origin and stores it. That fetch-and-store is called a cache-fill. After the fill, subsequent requests for the same key are served from cache until TTL expires.

Routing Terms

Anycast — Imagine a thousand servers all over the world raising their hands at once and saying "I'm at this address." When you mail a packet there, the postal service hands it to whichever one is closest. That's Anycast: many physically separated servers advertise the same IP address, and the internet's routing protocol (BGP) automatically picks the nearest one for each user. Used by Cloudflare and Fastly. The user doesn't have to know which PoP they're talking to — they just send requests to the CDN's IP and the network silently routes to the nearest match.

GeoDNS — Instead of one shared IP, each PoP has its own unique address, and a smart DNS server hands out the right one based on where the asker is. A DNS resolver in Tokyo asks for example.com and gets back the Tokyo PoP's IP; a London resolver gets the London PoP's IP. That's GeoDNS — geography-aware DNS, used historically by Akamai (CloudFront also relies heavily on latency-based DNS for steering, in addition to Anycast). It gives more explicit per-region control than Anycast, but failover is slower because DNS records have to expire before traffic reroutes.

BGP (Border Gateway Protocol) — The routing protocol that Anycast relies on. BGP is how internet service providers agree on how to route traffic between networks. When Cloudflare announces the same IP from 300 PoPs via BGP, every ISP's router directs traffic to the closest Cloudflare PoP — automatically, without any DNS involvement.

Cache Control Terms

Cache-Control header — An HTTP response header your origin sends to control how caches (browsers, CDNs, proxies) should cache the response. Key directives: public (any cache can store it), private (only the user's browser, not CDNs), max-age=N (TTL in seconds), no-cache (must revalidate before serving), no-store (never cache at all), immutable (content will never change, skip revalidation).

ETag and conditional GET — An ETag is a fingerprint of a response's content. When a cached entry expires, instead of fetching the full content again, the CDN sends a conditional GET with the old ETag. If the content hasn't changed, the origin returns a 304 Not Modified response (no body, just a header) — the CDN resets the TTL and serves the cached content. This saves bandwidth when content changes rarely.

Vary header — Tells the CDN "this response varies by this request header." For example, Vary: Accept-Encoding means the CDN should cache separate copies for gzip-compressed and brotli-compressed versions of the same URL. Vary: Accept-Language means separate copies per language. Overusing Vary fragments the cache and tanks hit ratios — a common misconfiguration.

Surrogate key (cache tag) — A CDN-specific extension where you tag responses with logical identifiers (e.g., product-1234) and then purge by tag instead of by URL. One tag can correspond to thousands of URLs. Fastly uses Surrogate-Key headers; Cloudflare uses Cache-Tag; Akamai uses Edge-Control with tags. This is the scalable way to invalidate — instead of enumerating every URL that contains product #1234's data, you purge the tag and every matching URL is invalidated.

Signed URL — A URL with a cryptographic signature appended as a query parameter. The CDN verifies the signature before serving the content, which allows you to grant time-limited or user-specific access to otherwise-private content without changing the underlying file. Used for video streaming, software download links, and gated content.

Performance Metrics

Request-hit ratio vs byte-hit ratio — Two different ways to measure cache effectiveness. The request-hit ratio measures what fraction of requests are served from cache — good for measuring how much you're offloading origin compute. The byte-hit ratio measures what fraction of bytes are served from cache — better for measuring bandwidth savings. A single uncached 4 GB video file can make your byte-hit ratio look terrible even if your request-hit ratio is 99%.

Edge compute — Running code at the CDN edge instead of at the origin. Edge compute platforms (Cloudflare Workers, Lambda@Edge) let you run JavaScript or WebAssembly at the edge. This means you can handle A/B testing, authentication, redirects, and request rewriting at the edge — responses in 5 ms, no origin round trip at all.

The diagram threads all the vocabulary terms into a single request flow: a request arrives at the edge PoP, the cache key is checked, a hit serves the response immediately (governed by TTL and ETag revalidation), a miss triggers cache-fill through the shield and origin, and a purge/surrogate-key deletion allows early invalidation before TTL expiry.

The CDN vocabulary breaks into three clusters: cache mechanics (cache key, hit/miss, TTL, purge, cache-fill, ETag, Vary, surrogate key), routing (Anycast, GeoDNS, BGP), and performance metrics (request-hit ratio, byte-hit ratio, edge compute). Understanding these terms precisely — especially what goes into a cache key and what TTL controls — is the prerequisite for every configuration decision in the sections that follow.

Section 5

How a Request Finds the Edge — Anycast vs GeoDNS Routing

When a user's browser makes a request to your CDN-fronted domain, something has to steer that request to the nearest edge PoP. Two fundamentally different techniques exist for doing this: Anycast and GeoDNS. They look similar to the end user — both result in the request reaching a nearby edge — but the underlying mechanics, failure modes, and trade-offs are completely different.

Anycast: Let BGP Do the Work

Before the routing trick makes sense, here's the world it operates in. The internet isn't one network — it's a patchwork of thousands of separately-run networks (Cloudflare's, Comcast's, AT&T's, Amazon's, your university's) stitched together. Each separately-run network is called an Autonomous System (AS). The protocol they use to gossip among themselves about who can reach which addresses — basically a global "ask your neighbor for directions" system — is BGP (Border Gateway Protocol). Every AS broadcasts to its neighbors "I can reach these IP ranges," and packets travel from AS to AS following that chain of announcements until they arrive at their destination.

Anycast works by having the CDN announce the same IP address block from every single PoP via BGP. Cloudflare's edge nodes — whether in Tokyo, London, or São Paulo — all advertise the same IP range (e.g., 104.16.0.0/12). When a user sends a packet to 104.16.x.x, every router along the path makes a BGP routing decision: "which of my neighbors can reach this IP?" Since multiple neighbors can reach the same IP (because all of Cloudflare's PoPs advertise it), the router picks the one that's topologically closest — fewest hops, lowest cost. The result is that the packet naturally flows to the nearest Cloudflare PoP without any DNS trick or geographic lookup.

The key advantage: Anycast failover is instantaneous. If a PoP goes down, it stops advertising the IP in BGP, and within seconds (the BGP convergence time), all traffic shifts to the next nearest PoP. No DNS TTL to wait for. No geographic mapping to update.

GeoDNS: Return the Right IP Based on Location

GeoDNS takes a different approach. Instead of advertising the same IP everywhere, each PoP has its own unique IP. The CDN's DNS servers look at the location of the DNS resolver making the query and return the IP of the geographically nearest PoP.

When a user in Tokyo types example.com, their browser asks their ISP's DNS resolver to look up the address. The CDN's DNS authoritative server sees that the resolver is in Japan and returns the IP of the Tokyo PoP. A user in London gets the London PoP IP. The DNS layer is doing the geographic routing, not the network layer.

The complication: DNS resolvers are not always co-located with users. A company using Google's Public DNS (8.8.8.8, which resolves from Google's locations) might get routed to the wrong PoP because Google's resolver in South Carolina handles the query for a user in Germany. EDNS Client Subnet (ECS) is an extension that partially solves this by having resolvers pass along the user's IP prefix to the authoritative DNS, allowing more accurate geographic targeting.

Anycast is elegant precisely because it offloads routing intelligence to BGP — the same protocol that's been routing the internet since 1994. The CDN doesn't need geographic databases, DNS tricks, or complex control planes. The internet itself handles load balancing and failover. A PoP failure is handled in seconds as BGP routes converge around the gap.

GeoDNS works at the DNS layer: the CDN's authoritative name server consults a geographic IP database, maps the resolver's location to the nearest PoP, and returns that PoP's unique IP address. The browser then connects directly to that IP. GeoDNS gives more fine-grained control (you can map specific countries to specific PoPs, implement traffic splitting by region, or route different user segments to different origins) but failover is slower: DNS TTLs mean it can take minutes for traffic to shift away from a failed PoP.

Side-by-Side Comparison

Anycast GeoDNS

Same IP globally; BGP finds nearest
Failover in seconds (BGP convergence)
No geographic database needed
Less control over exact routing
Used by Cloudflare, Fastly

Unique IP per PoP; DNS resolves to nearest
Failover in minutes (DNS TTL)
Requires accurate geo IP database
Fine-grained regional control
Akamai historically; CloudFront uses latency-based DNS + Anycast (now offers Anycast static IPs as of late 2024)

In practice, modern CDNs often combine both: Anycast for routing packets to the nearest PoP quickly, with GeoDNS as a fallback or for specific routing policies. The distinction matters when you're choosing between CDN providers or debugging why traffic from a specific region isn't hitting the expected PoP.

Why Cloudflare's network is hard to replicate: Cloudflare's Anycast network is co-located inside hundreds of internet exchange points (IXPs) — the physical locations where ISPs physically interconnect. This means Cloudflare is often directly peered with your user's ISP, cutting the number of network hops to 1 or 2. Building this takes a decade and thousands of physical co-location agreements — it's why "just run your own CDN" isn't a real option for most companies.

Two techniques steer users to the nearest CDN edge: Anycast (same IP everywhere, BGP finds nearest — used by Cloudflare/Fastly, instant failover) and GeoDNS (unique IP per PoP, DNS resolves by geography — used historically by Akamai, finer control but slower failover). Anycast is simpler and more resilient; GeoDNS offers more routing policy control. Modern CDNs often combine both — CloudFront, for example, uses latency-based DNS routing and added Anycast static IPs in 2024.

Section 6

What's in a CDN Cache Key — and Why It Matters

The cache key is the single most important configuration decision you'll make for a CDN. Get it wrong and your hit ratio collapses — you can have a perfectly healthy origin, thousands of edge PoPs, and a completely broken CDN that serves every request from origin anyway. Most production CDN bugs trace back to cache key misconfiguration.

The Default Cache Key (and Why It's Often Wrong)

By default, most CDNs use the full URL as the cache key: scheme + host + path + query string. Two requests that differ in any part of the URL get separate cache entries. That sounds sensible — and for most things it is — but real-world URLs are full of noise that shouldn't create separate cache entries:

UTM tracking parameters — ?utm_source=newsletter&utm_campaign=sale. Added by marketers to every outbound link. Each unique combination creates a new cache entry. A campaign with 50 UTM variants means 50 separate cache misses for the same page.
Ad click IDs — ?gclid=... (Google Ads), ?fbclid=... (Facebook). These are unique per click — every single user who arrives via an ad gets a cache key that's never been seen before. Hit ratio for ad traffic: 0%.
Session tokens in query strings — ?session=abc123. If someone passes auth tokens in the URL (which they shouldn't, but it happens), every session gets a unique cache entry. Origin load scales with active users instead of with unique URLs.
Sort/filter parameters — ?sort=price&order=asc. These should create separate cache entries (different content), but ?sort=price&order=asc and ?order=asc&sort=price are the same content with parameters in different order — different cache keys by default.

The Random-Query-Parameter Disaster (with numbers)

Here's the concrete failure mode. Imagine a product page for /products/widget-42 that you've carefully cached for 1 hour. Cache hit ratio for that page is 95% — excellent. Then marketing launches a campaign. Every link in the campaign email looks like /products/widget-42?gclid=Cj0KCQiA4NWrBhD_ARIsAFCKg_... — with a unique gclid per user per click. The CDN treats every unique gclid as a new cache key. It fetches the page from origin for every single click. Your 95% hit ratio for that URL drops to 0% for all campaign traffic. If the campaign drives 100,000 clicks, your origin gets 100,000 requests for what is functionally the same page.

The diagram captures both sides of cache key design: accidental expansion (noise query params that create separate entries for what is functionally the same content) and intentional expansion (the Vary header, which tells the CDN to cache separate copies for genuinely different variants of the same URL).

How to Fix It: Cache Key Normalization

Every major CDN lets you customize which query parameters are included in the cache key. The fix for the gclid/utm disaster is to strip those parameters from the cache key — the CDN still receives the full URL (so analytics tracking still works), but it looks up the cache entry using only the meaningful parameters.

# In Cloudflare Cache Rules (or via API), create a rule that: # 1. Matches the URL path you want to control # 2. Sets "Cache Key > Query String" to "Ignore specific parameters" # 3. Lists the noise parameters to strip # Via Terraform (cloudflare_ruleset): resource "cloudflare_ruleset" "cache_rules" { zone_id = var.zone_id name = "CDN Cache Key Rules" kind = "zone" phase = "http_request_cache_settings" rules { action = "set_cache_settings" action_parameters { cache = true cache_key { query_string { exclude { # Strip these from the cache key (they're still forwarded to origin) list = ["gclid", "fbclid", "msclkid", "utm_source", "utm_medium", "utm_campaign", "utm_term", "utm_content"] } } } edge_ttl { mode = "override_origin" default = 3600 # 1 hour regardless of Cache-Control from origin } } expression = "(http.request.uri.path matches \"^/products/\")" enabled = true } }

# Fastly uses VCL (Varnish Configuration Language). # The vcl_hash subroutine controls what goes into the cache key. # By default it includes the URL and Host header. # We modify it to strip noise query parameters before hashing. sub vcl_recv { # Strip known tracking params from the URL before it reaches the cache key. # Using Fastly's querystring module functions. # Remove gclid (Google Ads click ID) — unique per click, useless in cache key if (req.url ~ "[?&]gclid=") { set req.url = regsuball(req.url, "([?&])gclid=[^&]*(&|$)", "\1"); } # Remove fbclid (Facebook click ID) if (req.url ~ "[?&]fbclid=") { set req.url = regsuball(req.url, "([?&])fbclid=[^&]*(&|$)", "\1"); } # Remove all UTM parameters in one pass if (req.url ~ "[?&]utm_") { set req.url = regsuball(req.url, "([?&])utm_[^=]+=[^&]*(&|$)", "\1"); } # Clean up trailing ? or & left over from stripping set req.url = regsub(req.url, "[?&]$", ""); #FASTLY recv return(pass); # or return(hash) to proceed with caching }

# AWS CloudFront uses "Cache Policies" to control the cache key. # A cache policy defines which query strings, headers, and cookies # are included in the cache key. The key insight: only include # parameters that genuinely affect the response content. resource "aws_cloudfront_cache_policy" "product_pages" { name = "product-pages-policy" comment = "Cache key for product pages — strips tracking params" default_ttl = 3600 # 1 hour default max_ttl = 86400 # 24 hour max min_ttl = 0 parameters_in_cache_key_and_forwarded_to_origin { cookies_config { cookie_behavior = "none" # Don't include any cookies in cache key } headers_config { header_behavior = "none" # Don't include any headers in cache key } query_strings_config { # Only include the query params that actually change the response. # gclid, fbclid, utm_* are NOT in this list — they're stripped. query_string_behavior = "whitelist" query_strings { items = ["color", "size", "variant"] # only meaningful params } } enable_accept_encoding_brotli = true # separate cache entry per encoding enable_accept_encoding_gzip = true } } resource "aws_cloudfront_distribution" "main" { # ... other config ... ordered_cache_behavior { path_pattern = "/products/*" cache_policy_id = aws_cloudfront_cache_policy.product_pages.id # ... } }

All three snippets do the same thing conceptually: they tell the CDN "when checking whether you have a cached copy of this request, look up by the meaningful URL parts only — ignore these noise parameters." The noise parameters are still forwarded to the origin on a cache miss (so analytics and attribution still work), but they don't create new cache entries.

The Vary Header: When Cache Key Expansion Is Intentional

Sometimes you do want separate cache entries for the same URL. The most common case is content encoding: a browser that supports Brotli compression should get the Brotli-compressed version of a file, while a browser that only supports gzip should get the gzip version. Both serve the same logical resource at the same URL, but the bytes are different. The Vary: Accept-Encoding response header tells the CDN "cache this URL separately per Accept-Encoding value."

But Vary needs careful use. Vary: User-Agent creates as many cache entries as there are distinct user-agent strings — thousands. Your cache fills with entries you'll never reuse. Vary: Cookie means every user gets their own cache entry — effectively disabling the cache for that URL. The rule: only Vary on headers whose values genuinely map to meaningfully different responses, and where the number of distinct values is small (2–10, not thousands).

The Accept-Language trap: Vary: Accept-Language sounds right for serving translated content at the same URL. In practice, browsers send dozens of slightly different Accept-Language values (en-US,en;q=0.9 vs en-GB,en;q=0.8,fr;q=0.7). This fragments the cache beyond usefulness. The better pattern: serve different translations from different URLs (/en/, /fr/, /de/) and let the CDN cache each URL independently without Vary.

When to Include Headers and Cookies in the Cache Key

By default, CDNs ignore request headers and cookies when building the cache key — because including them would mean every user with a different session cookie gets a unique cache entry (destroying the cache). But there are legitimate cases where headers should be in the key:

Authorization header for semi-private content — if you have content that's specific to a role (e.g., "admin" vs "user"), you can include a normalized role header in the cache key. One cache entry per role, not per user. Never include the raw auth token — that creates a unique entry per session.
Accept header for API responses — if your API returns JSON for Accept: application/json and HTML for Accept: text/html, include Accept in the cache key (Vary: Accept in the response is the right mechanism).
Device type for responsive CDN — if you serve genuinely different HTML for mobile vs desktop (not just CSS changes), you can add a normalized X-Device-Type: mobile|desktop header and include it in the cache key. Two entries per URL instead of thousands.

The cache key is what the CDN uses to look up cached responses. The default key (full URL including query string) breaks under real-world traffic because UTM parameters and ad click IDs (gclid, fbclid) create a unique cache entry per user, collapsing hit ratios to near zero. Fix: strip noise params from the cache key using cache rules (Cloudflare), VCL (Fastly), or cache policies (CloudFront). The Vary header intentionally expands the cache key for genuinely different content variants — but must be used carefully or it fragments the cache. Never Vary on User-Agent or raw Cookie headers.

Section 7

The HTTP Caching Protocol — Cache-Control, ETag, Vary

Before a CDN can cache anything, it needs instructions. Those instructions arrive as HTTP headers that your origin server sends with every response. Think of them as a contract: your server tells every cache in the world — browser caches, CDN edges, proxy servers — exactly how long to keep the response, whether to share it with other users, and how to check if the content is still fresh. Get this contract right and your CDN edge hit ratio is 90%+. Get it wrong and the CDN becomes an expensive pass-through layer that adds latency instead of removing it.

Cache-Control: The Master Directive

The most important header is Cache-Control. It's a comma-separated list of directives, each controlling one aspect of caching behavior. Here is a real production header and what every piece means:

Cache-Control: public, s-maxage=86400, max-age=3600, stale-while-revalidate=60, immutable

Let's unpack each directive. The distinction between max-age and s-maxage is where most engineers get tripped up — they look similar but apply to completely different caches:

public — this response may be stored by any cache, including shared caches like CDN edges. Without this (or with private), a CDN must not cache it. WHY: HTTP distinguishes private caches (your browser) from shared caches (CDN, proxy) because some responses contain personal data — a bank statement should never sit on a CDN edge where other users could retrieve it.
max-age=3600 — browsers (and any cache that doesn't have a more specific instruction) should consider this response fresh for 3,600 seconds (1 hour). After that the browser revalidates. WHY: giving browsers a TTL means repeated page visits don't hit the network at all — the response comes from local disk cache in milliseconds.
s-maxage=86400 — shared caches (CDN edges specifically) should consider it fresh for 86,400 seconds (24 hours). This overrides max-age for shared caches. WHY: you often want the CDN to hold content longer than the browser does. A browser cache miss hits the CDN (still fast), but a CDN miss hits the origin (expensive). So CDN TTL is set longer to maximize edge hits while browsers revalidate more frequently to avoid showing stale content.
immutable — the content will never change while it's fresh; don't bother revalidating even if the user force-refreshes. WHY: JS bundles named with a content hash (e.g., app.a1b2c3.js) literally cannot change — if the content changes, the filename changes. Browsers were wasting network requests revalidating these on every user-initiated reload before immutable was introduced.
private — only the user's own browser may cache this. CDN edges must pass it straight to the origin and never store it. This is the killer misconfiguration: a developer adds private to a cacheable product page because they're worried about user-specific data, and their CDN cache hit ratio drops to 0% overnight.
no-cache — confusingly, this does NOT mean "don't cache." It means "cache it, but always revalidate before using it." A conditional GET (with If-None-Match) is sent on every request; if the origin confirms nothing changed, a 304 comes back instantly. WHY this exists: you want the speed benefit of revalidation (304 is tiny, no body) while guaranteeing users never see stale content for even a second.
no-store — this one actually means "don't cache, ever, anywhere." For genuinely sensitive data (bank transactions, medical records). The CDN forwards every request to the origin, adding its own latency overhead.
must-revalidate — once the TTL expires, caches must not serve the stale version even if the origin is down. WHY: for financial or safety-critical data where serving outdated information is worse than serving an error. Contrast with stale-if-error (Section 8) which does the opposite.

The decision tree above maps the most common scenarios. The critical fork is that very first question — user-specific or not? Developers often set private "just to be safe" on pages that are actually shared content. That conservative choice costs every user a full origin round trip on every request.

ETag and Conditional GETs — How "Not Modified" Works

An ETag (entity tag) is a fingerprint for a response body — usually a hash of the content or a version identifier. The origin sends it in the response. Later, when the cached copy is about to expire, the edge (or browser) sends back that fingerprint in a request header called If-None-Match. If the content hasn't changed, the origin replies with 304 Not Modified — no body, just headers. This is the cheapest possible cache refresh: you get confirmation the content is still valid for almost zero bandwidth cost.

Here is the actual three-step wire conversation. Read it as a script with three lines spoken by two computers:

# Origin sends on first response: HTTP/1.1 200 OK ETag: "a1b2c3d4e5f6" Cache-Control: public, s-maxage=86400 Content-Length: 45230 # Edge sends when TTL expires (conditional GET): GET /api/products HTTP/1.1 If-None-Match: "a1b2c3d4e5f6" # Origin replies if nothing changed: HTTP/1.1 304 Not Modified ETag: "a1b2c3d4e5f6" Cache-Control: public, s-maxage=86400 # (no body — saves 45,230 bytes of bandwidth)

In the first block, the origin says "here's the full 45 KB response, and its fingerprint is a1b2c3d4e5f6 — cache it for a day." The edge stores both the bytes and the fingerprint. A day later, when the TTL is about to expire, the edge doesn't ask "give me everything again" — it asks "do you still have that same fingerprint?" by sending the If-None-Match header. In the third block, the origin checks its own current fingerprint, sees it matches, and replies with just 304 Not Modified — no body at all. Forty-five kilobytes of bandwidth saved on what is effectively a "still valid, carry on" handshake.

The 304 flow is the unsung hero of web performance. The edge re-confirms freshness for almost no cost — just headers, no body. Compare this to no ETag: without a fingerprint to compare, the origin must send the full 45 KB body every time the TTL expires, even if nothing changed. For high-traffic APIs this difference is significant.

Vary: Teaching the CDN About Variants

Here's a subtle but important problem. Your origin serves gzip-compressed responses to browsers that support it, and uncompressed responses to old clients. The URL is identical: GET /api/data.json. But the response bodies are completely different. If the CDN caches the gzip version and serves it to a client that can't handle gzip, that client gets garbage bytes. The fix is the Vary header.

Vary: Accept-Encoding tells the CDN: "the response varies based on the Accept-Encoding request header, so cache a separate copy for each distinct value." The CDN splits its cache: one entry for Accept-Encoding: gzip, another for Accept-Encoding: br (Brotli), another for clients that send no encoding preference.

The warning at the bottom of the diagram is real. Vary: Cookie is the classic disaster. Every unique cookie value creates its own cache entry. Since session cookies are unique per user, the CDN effectively cannot cache anything — it stores one entry per user per URL and never gets a hit. Most CDNs (Cloudflare, CloudFront) strip or ignore Vary: Cookie entirely on cacheable responses to prevent this; others let it explode your cache storage. Know your CDN's behavior before you ship Vary.

The Cache-Control contract in one rule: Set Cache-Control: public, s-maxage=N on every shared resource. Use immutable for hashed filenames. Use private only for genuinely user-specific data. Never set private "just to be safe" on shared content — you are paying for a CDN that will do nothing.

Cache-Control is the contract between your origin and every cache in the world. The s-maxage vs max-age distinction tells CDN edges how long to hold content independently from browser TTLs. ETags enable 304 Not Modified revalidation — the cheapest possible cache refresh. Vary splits the cache by request dimension but must be used with care: Vary: Cookie destroys hit ratio. The single most common CDN misconfiguration is setting private on cacheable content, reducing the edge hit ratio to zero.

Section 8

Cache Behaviors — TTL, stale-while-revalidate, stale-if-error

TTL — Time To Live — is the number most engineers know. Set s-maxage=86400 and the CDN serves your content for 24 hours, then discards it and asks the origin again. Simple. But in production, a naive TTL creates a brutal binary: content is either fully cached (fast) or fully expired (origin hit). There's no middle ground. The middle ground is what stale-while-revalidate and stale-if-error provide — and once you understand them, you won't want to operate without them.

The TTL Problem: The Thundering Herd at Expiry

Imagine your homepage has a 60-second TTL and receives 10,000 requests per minute. At second 61, the TTL expires. Now every incoming request simultaneously finds a stale cache entry. All 10,000 requests/minute want to know: "is this still fresh?" The CDN serializes them — one request triggers a refetch to the origin, and the others either wait (request coalescing, if the CDN supports it) or all hammer the origin at once (the thundering herd problem). Your origin goes from handling zero requests to handling 10,000 requests/minute in an instant, every 60 seconds like clockwork. This is why systems built with only TTLs sometimes have spiky origin load that looks mysterious on a dashboard.

stale-while-revalidate: Serve First, Refresh in Background

The solution is a behavior called stale-while-revalidate, standardized in RFC 9111 (previously RFC 5861). Here's the idea in one sentence: after the TTL expires, keep serving the old cached copy to users while simultaneously sending a background refresh request to the origin. The user gets instant response (no waiting), and the cache silently updates for the next request. Nobody waits. Nobody thunders.

Cache-Control: public, s-maxage=60, stale-while-revalidate=30

Reading this header: the CDN caches the response for 60 seconds (s-maxage=60). After 60 seconds, the entry is "stale." For the next 30 seconds (stale-while-revalidate=30), when a request comes in the CDN does two things simultaneously: it serves the stale cached copy immediately (zero latency penalty), AND fires a background GET to the origin to fetch a fresh copy. Once the fresh copy arrives, it replaces the stale one. Users in that 30-second window never experience a cache miss — they see a response that might be up to 90 seconds old at most, but they get it instantly.

The impact of that last line is significant. Without stale-while-revalidate: every 60 seconds, up to N concurrent requests hit the origin. With it: the origin gets exactly one request every 60 seconds, while all user-facing requests are served at cache speed. This is especially powerful for news sites, dashboards, or APIs where content changes frequently but millisecond-freshness is not required.

stale-if-error: Your Origin Goes Down, Your Site Stays Up

The second resilience directive is stale-if-error. The premise: your origin server is having an incident. It's returning 500s, timing out, or simply unreachable. Normally, a CDN edge that can't reach the origin returns a 502 or 503 to the user. With stale-if-error, it instead serves whatever cached copy it has — even if that copy is past its TTL — for up to N additional seconds. Your users see slightly stale content instead of an error page.

Cache-Control: public, s-maxage=300, stale-if-error=86400

Reading this: fresh for 5 minutes; if the origin fails, serve stale content for up to 24 hours. For most websites — blogs, marketing pages, documentation, product listings — showing content that's 4 hours old during an incident is vastly preferable to showing a 503 error page. Users can still browse. Conversion doesn't crater. The incident becomes invisible to most visitors.

This diagram illustrates why stale-if-error is sometimes called "poor man's high availability." It won't help you if the content absolutely must be current (a live stock price, a flash sale countdown), but for the vast majority of web content it turns an origin incident from a user-visible outage into a silent degradation. Cloudflare, Fastly, and AWS CloudFront all support this directive natively (CloudFront added support for both stale-while-revalidate and stale-if-error in May 2023).

Production recipe for most pages: Cache-Control: public, s-maxage=300, stale-while-revalidate=60, stale-if-error=86400. Content is fresh for 5 minutes. In the next 60 seconds after expiry, stale is served while the cache refreshes invisibly. If the origin is down at any point, serve up to 24 hours of stale content instead of an error. Almost every cacheable page benefits from all three directives together.

TTL alone creates thundering-herd spikes at expiry. stale-while-revalidate (RFC 9111) eliminates this by serving stale content immediately while refreshing in the background — the origin sees exactly one request per TTL cycle regardless of traffic volume. stale-if-error keeps your site serving when the origin fails, turning hard outages into silent degradations. Together these three directives make CDN caching resilient rather than fragile.

Section 9

CDN Invalidation — Purge, Surrogate Keys, Cache Busting

Setting a long TTL is the right call for performance — it keeps content at the edge for hours or days and slashes origin load. But long TTLs create a problem: what happens when the content actually changes? A product price updates. A blog post is corrected. A security vulnerability in a JavaScript file requires an emergency patch. You can't wait 24 hours for the CDN to notice. You need to invalidate the cached copy immediately. Three techniques exist, each with different speed, cost, and complexity trade-offs.

Method 1: URL Purge — Simple but Slow

The most obvious approach: tell the CDN "delete the cached copy of this specific URL." Every major CDN exposes an API for this. Here's what it looks like for Cloudflare:

curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/purge_cache" \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ --data '{"files":["https://example.com/products/shoes-42.html","https://example.com/products/shoes-42.jpg"]}'

Reading the call line by line: you POST to a Cloudflare endpoint scoped to your zone (your domain); the bearer token authenticates you; the JSON body lists the exact URLs you want gone. The CDN receives this request and tells every edge node to drop those URLs from their cache. The next request for each URL will miss the cache and fetch fresh content from the origin.

The downside is propagation time. Cloudflare advertises purge propagation in "under 30 seconds" globally. Fastly is typically faster (closer to 150 ms in many cases). AWS CloudFront's invalidation is well-known to be slower — documentation says propagation can take "up to 15 minutes," though in practice it's often under a minute. The key point: URL purge is not instantaneous. During propagation, some edge nodes have the old content and some have triggered new fetches. For a product price change this delay is usually fine. For a critical security patch, you may also want method 3 (cache-busting URL) as a belt-and-suspenders approach.

URL purge also has a scaling problem: if your product catalog has 500,000 URLs and a supplier updates their entire inventory, you'd need 500,000 purge API calls. CDNs rate-limit these (Cloudflare allows 30,000 files per purge call but each call counts toward API rate limits). For bulk invalidation, you need something smarter.

Method 2: Surrogate Keys / Cache Tags — Fast and Surgical

Instead of targeting URLs, you tag cache entries at fill-time with logical identifiers, then purge by tag. The origin adds a special header to responses that groups related resources together. Here's Fastly's flavor (called Surrogate-Key) and Cloudflare's flavor (called Cache-Tag):

# Fastly Surrogate-Key header (sent by origin at cache fill time): HTTP/1.1 200 OK Surrogate-Key: product-42 category-shoes inventory-supplier-7 Cache-Control: public, s-maxage=3600 # Cloudflare Cache-Tag header (same idea, different name): HTTP/1.1 200 OK Cache-Tag: product-42,category-shoes,supplier-7 Cache-Control: public, s-maxage=3600

Both blocks attach tags to the response at the moment it's cached. Fastly takes space-separated tags in a header called Surrogate-Key; Cloudflare takes comma-separated tags in a header called Cache-Tag. The CDN stores these tags as metadata next to the cached object — the user's browser never sees them (the CDN strips them before forwarding). Now the product page, the category page listing shoes, and the inventory widget are all stamped with product-42. When the product's price changes, a single API call purges everything tagged product-42 — regardless of how many URLs that covers. This is the "fan-out purge": one tag, thousands of URLs, propagated globally in under a second on Fastly and Cloudflare.

# Purge everything tagged product-42 on Fastly (sub-second propagation): curl -X POST "https://api.fastly.com/service/$SERVICE_ID/purge/product-42" \ -H "Fastly-Key: $API_TOKEN" # Cloudflare cache-tag purge: curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/purge_cache" \ -H "Authorization: Bearer $API_TOKEN" \ --data '{"tags":["product-42","category-shoes"]}'

This is the preferred method for large e-commerce sites, news platforms, and any CMS-backed site where content is structured and taggable. The CDN only holds these tags internally — users never see them in responses (CDNs strip Surrogate-Key and Cache-Tag before forwarding to browsers). The tag is purely a CDN-internal grouping mechanism.

Method 3: Cache-Busting URLs — Instant and Zero-Config

The third approach sidesteps invalidation entirely: if a resource changes, change its URL. The old URL stays in the CDN cache and ages out via TTL (nobody requests it anymore). The new URL cold-starts its own cache entry with fresh content. No purge API call required. No propagation delay. No CDN-specific configuration.

<script src="/static/app.a1b2c3.js"></script>  <script src="/static/app.d4e5f6.js"></script>

Look closely at the two <script> tags. The filename in each one literally contains a fingerprint of the file's contents (a1b2c3 before, d4e5f6 after). When even one byte of the JavaScript changes, the build tool computes a new fingerprint and emits a new filename. The HTML page that references the script gets re-rendered with the new filename, so browsers and CDN edges naturally request the new URL — guaranteeing they fetch the new content. No purge call needed; the URL itself is the invalidation signal. Modern build tools (Webpack, Vite, esbuild, Parcel) do this automatically — they hash file contents and embed the hash in the filename. The HTML that references these files is itself a short-lived asset (often with a TTL of 5 minutes or less). So the pattern is: static assets get a very long TTL (1 year is common: max-age=31536000, immutable) because the URL guarantees freshness; the HTML gets a short TTL and is the only thing you ever need to purge.

In production, the three methods combine: cache-busting URLs for all static assets (JS, CSS, images, fonts) with max-age=31536000, immutable; surrogate keys for HTML and API responses with short-to-medium TTLs; URL purge as a manual escape hatch for edge cases. The combination means your CDN hit ratio stays high (long TTLs on the majority of traffic) while updates propagate correctly (surrogate key purge on content changes).

The origin-offload math: If 95% of your traffic is static assets (hashed JS/CSS/images) cached for 1 year, and 5% is HTML cached for 5 minutes with surrogate-key invalidation, your origin handles roughly 5% of raw request volume under normal circumstances — and even less if HTML cache hit ratio is high. The combination of long TTLs + targeted invalidation is how major CDN deployments achieve 98–99% cache hit ratios.

CDN invalidation comes in three flavors: URL purge (simple, universal, 30s–15min propagation), surrogate keys (surgical fan-out purge, sub-second propagation, requires CDN support), and cache-busting URLs (instant, zero CDN config, works only for immutable assets). Production systems use all three together: cache-busting for static assets with year-long TTLs, surrogate keys for structured content, and URL purge as a manual escape hatch.

Section 10

Edge Compute — Cloudflare Workers, Lambda@Edge, Fastly Compute

Caching static and semi-static content is what CDNs were built for. But modern applications need more than that: they need to run logic — checking a JWT, picking an A/B test variant, redirecting mobile users, personalizing a cached page — before serving a response. The old answer was: send the request to the origin for the logic, then maybe cache the result. The new answer is: run the logic at the edge, inside the CDN itself, with no origin round trip at all. This is edge compute.

The idea is to deploy tiny functions — not full application servers — to every CDN edge node simultaneously. A user in Tokyo gets their logic executed in Tokyo. No request crosses the Pacific. The function runs in under a millisecond, produces a response, and the user is none the wiser that there was no "backend" involved. For the right use cases this is transformative: you get origin-level logic at CDN-level latency.

The diagram captures the key shift: with edge compute, the CDN edge stops being a pure cache layer and becomes a thin application layer. Logic that used to require an origin round trip now runs 10–20 ms from the user, regardless of where the origin lives. The origin still handles database queries and business-critical writes, but the stateless, logic-only work moves to the edge.

The Three Platforms Compared

Cloudflare Workers — V8 Isolates, Effectively Zero Cold Start

Cloudflare Workers run JavaScript (or WebAssembly) inside V8 isolates — the same JavaScript engine that powers Chrome, but without a full browser. Each Worker is an isolate: a lightweight, isolated JavaScript context that initializes in roughly 5 ms. WHY isolates instead of containers? A Docker container needs to boot a full OS process (~100–500 ms cold start). A V8 isolate just needs to initialize a JavaScript context — a few MB of memory and roughly 5 ms. Cloudflare runs many isolates per machine, sharing one V8 engine across them. Cloudflare itself argues this is effectively a zero cold start in practice: the network round trip between client and the Cloudflare edge typically exceeds the isolate initialization time, so the Worker is ready by the time the request arrives.

Workers run on Cloudflare's globally distributed network (around 330 cities and growing). A Worker you deploy runs globally within seconds — there's no "region" to configure. The programming model is the Fetch API (a request-in, response-out function), which is intentionally limited: Workers have no filesystem access, no traditional TCP sockets (without Durable Objects), and a 10 ms CPU time limit on the free tier (50 ms on paid). These constraints exist to enforce the stateless, fast-execution model.

// Cloudflare Worker: A/B test at the edge export default { async fetch(request, env) { const url = new URL(request.url); // Stable bucketing: hash the user's IP into bucket A or B const ip = request.headers.get('CF-Connecting-IP') ?? '0.0.0.0'; const bucket = (simpleHash(ip) % 2 === 0) ? 'a' : 'b'; // Rewrite URL to variant — no origin round trip for the routing decision url.pathname = `/variants/${bucket}${url.pathname}`; // Fetch from origin (or cache) with the rewritten URL const response = await fetch(url.toString(), request); const newResponse = new Response(response.body, response); newResponse.headers.set('X-AB-Variant', bucket); return newResponse; } }; function simpleHash(str) { let h = 0; for (const ch of str) h = (h * 31 + ch.charCodeAt(0)) >>> 0; return h; }

This Worker runs in the Cloudflare PoP nearest to the user. The A/B routing decision — which in a traditional architecture would require an origin call to a feature-flag service — happens entirely at the edge. The only origin call is for the actual page content.

Lambda@Edge — Node.js/Python, CloudFront-Integrated

AWS Lambda@Edge runs standard Lambda functions at CloudFront edge locations. You write a Lambda function (Node.js or Python), associate it with a CloudFront distribution, and specify when it fires: on viewer request, origin request, origin response, or viewer response. These four hook points let you intercept and modify requests/responses at different stages of the CloudFront pipeline.

The trade-off vs Cloudflare Workers: Lambda@Edge has higher cold start latency — typically in the range of 50–200 ms for the first invocation in a region after a period of inactivity, though warm invocations are fast. The functions also run in specific AWS regions (not every CloudFront PoP), and they can be up to 10 MB in size — much larger than Worker's 1–5 MB limit. Lambda@Edge integrates tightly with AWS IAM, CloudWatch, and other AWS services, which is valuable if your infrastructure is already on AWS.

// Lambda@Edge: add security headers on viewer response exports.handler = async (event) => { const response = event.Records[0].cf.response; const headers = response.headers; // Add security headers at the edge — no origin change required headers['strict-transport-security'] = [{ key: 'Strict-Transport-Security', value: 'max-age=63072000; includeSubdomains; preload' }]; headers['x-content-type-options'] = [{ key: 'X-Content-Type-Options', value: 'nosniff' }]; headers['x-frame-options'] = [{ key: 'X-Frame-Options', value: 'DENY' }]; return response; };

Reading the function top to bottom: the handler pulls the outgoing response object out of the CloudFront event, then mutates its headers dictionary to attach three security headers — HSTS (which tells browsers "only ever talk to me over HTTPS"), X-Content-Type-Options: nosniff (which prevents the browser from second-guessing Content-Type and accidentally executing data as a script), and X-Frame-Options: DENY (which blocks the page from being embedded in another site's iframe for clickjacking). Returning the response object pushes it back into the CloudFront pipeline. Adding security headers at the edge like this is a common Lambda@Edge use case: the origin doesn't need to be modified (useful for legacy apps), and the headers are guaranteed on every response regardless of which origin path served it.

Fastly Compute — WebAssembly, Sub-millisecond Cold Start

Fastly Compute (previously Compute@Edge) takes a different approach from Cloudflare Workers. Instead of V8 isolates running JavaScript, it uses WebAssembly (WASM) as the execution model. Your code — written in Rust, Go, JavaScript, or any language that compiles to WASM — is compiled to a WASM binary and deployed to Fastly's edge. The WASM runtime starts in under a millisecond (similar to Cloudflare Workers) and is strictly sandboxed at the WASM boundary.

WHY WASM instead of V8 isolates? Language flexibility is the headline benefit: Rust code running at the Fastly edge gives you memory safety, predictable performance, and no garbage collection pauses. For compute-heavy tasks (image manipulation, cryptographic operations, tight parsing loops), compiled Rust WASM significantly outperforms interpreted JavaScript. Fastly positions Compute for use cases that need predictable, high-throughput compute at the edge — not just lightweight routing logic.

// Fastly Compute: JWT verification in Rust WASM use fastly::{Request, Response}; use fastly::http::StatusCode; #[fastly::main] fn main(req: Request) -> Response { // Verify JWT from Authorization header before passing to origin let auth = req.get_header_str("Authorization").unwrap_or(""); if !auth.starts_with("Bearer ") { return Response::from_status(StatusCode::UNAUTHORIZED) .with_body_text_plain("Missing token"); } let token = &auth[7..]; if !verify_jwt(token) { return Response::from_status(StatusCode::FORBIDDEN) .with_body_text_plain("Invalid token"); } // Token valid — pass request to origin req.send("my_origin").unwrap() } fn verify_jwt(token: &str) -> bool { // Real implementation would use a JWT library + shared secret from Fastly Secrets Store !token.is_empty() }

JWT verification at the edge is particularly powerful: your origin never receives an unauthenticated request. The edge acts as a zero-trust enforcement layer — only valid-token requests ever consume origin resources, which reduces both load and attack surface.

Edge-Side Includes: Composing Cached Pages with Dynamic Fragments

One more edge compute capability worth knowing: Edge-Side Includes (ESI). The idea is to serve a mostly-cached HTML page but "punch holes" in it where dynamic, user-specific fragments go. The edge fetches the dynamic fragment (from a fast backend API) and stitches it into the cached page before delivering to the user.

<html> <body> <h1>Blue Sneakers — $89</h1> <img src="/products/shoe-42.jpg">  <esi:include src="/api/inventory/shoe-42" />  <esi:include src="/api/cart/badge" /> </body> </html>

The edge caches the outer HTML for an hour (avoiding expensive origin renders) while injecting fresh inventory and cart data per-request from fast microservices. The user sees a complete, personalized page with accurate inventory — but 95% of the content came from the CDN cache. ESI is supported natively by Varnish and Fastly, and can be implemented in Cloudflare Workers via custom logic.

Edge compute runs JavaScript, Python, or WebAssembly functions at the CDN PoP nearest to each user — eliminating origin round trips for stateless logic. Cloudflare Workers (V8 isolates, ~5 ms initialization across ~330 cities, effectively zero cold start in practice), Lambda@Edge (Node.js/Python, tighter AWS integration, higher cold start), and Fastly Compute (WASM, Rust-friendly, microsecond-scale cold start) are the three dominant platforms. Key use cases: A/B testing, JWT verification, request normalization, security headers, and ESI page composition. The model is not a CDN replacement for the origin but a thin logic layer that handles stateless decisions without crossing the globe.

Section 11

CDN Pricing Models — Bandwidth, Requests, Compute

CDN pricing looks deceptively simple until your first invoice. There are three independent dimensions being charged, and the relative weight of each depends entirely on your traffic profile. A video streaming service cares almost entirely about bandwidth. A high-frequency API cares primarily about request count. A serverless edge platform cares about compute time. Pick the wrong CDN for your workload and you pay 5–10× more than you should.

The Three Pricing Dimensions

Bandwidth (per GB egress) — the amount of data transferred from CDN edge nodes to end users. This is the dominant cost for most sites. A video file of 500 MB served to 100,000 users costs 50 TB of egress. At even modest per-GB rates that adds up quickly. Bandwidth pricing typically tiers down as volume increases — the first 10 TB is more expensive per GB than the next 40 TB. WHY: CDNs have fixed infrastructure costs but marginal bandwidth costs drop at scale, so they pass some savings to volume customers.

Requests (per million) — some CDNs charge separately for each HTTP request regardless of response size. A site serving a large HTML page as a single request costs less here than a site serving 150 small resources (JS chunks, API calls, images) per page load. Request pricing matters enormously for image-heavy sites, SPAs with many small JS bundles, and APIs with frequent small payloads. A CDN that looks cheap on bandwidth can be expensive on requests if your traffic pattern is many-small rather than few-large.

Edge compute (per invocation / CPU time) — each edge function execution costs money. Cloudflare Workers Free tier gives 100,000 requests/day at no cost; paid plans charge per million requests above the included quota. Lambda@Edge charges per request and per 1 ms of compute duration (rounded up). These costs are usually small relative to bandwidth for typical workloads, but a CPU-heavy Worker that runs 10 ms per request on millions of requests per day can accumulate meaningful cost.

Provider Comparison (Approximate Public List Prices)

The prices below are approximate and sourced from publicly available list pricing at time of writing. Actual pricing varies by volume, region, contract, and changes over time — always check the provider's current pricing page before making decisions.

Provider	Pricing Model	Bandwidth (approx.)	Requests	Best for
Cloudflare	Flat monthly + bandwidth overage	Generous free tier; Pro ~$20/mo includes bandwidth; Enterprise contract	Included in plan for most tiers	Most web workloads; best free tier; Workers ecosystem
AWS CloudFront	Pay-per-GB + pay-per-request	~$0.085/GB first 10 TB/mo (US/EU); lower in higher tiers	~$0.010 per 10k HTTPS requests (US/Canada), ~$0.012 in EU	AWS-native stacks; S3/ALB origins; Lambda@Edge
Fastly	Pay-per-request + bandwidth	Usage-based; billed per request + GB	Charged per request; no included quota	High-traffic media, streaming, Compute@Edge use cases
Akamai	Enterprise contract (committed use)	Negotiated; typically significant commitments required	Included in enterprise packages	Large enterprises; media delivery; edge security at scale
Bunny.net	Pay-per-GB (very low)	~$0.005–$0.01/GB (among lowest on market)	Low per-request cost	Budget-conscious; high-bandwidth static assets; storage + CDN combo

The Hidden Cost: Double-Paying on Cache Misses

There's a billing trap that catches engineers who don't think carefully about cache hit ratio. When a CDN cache misses, the edge fetches the resource from your origin. That origin fetch traverses the public internet (or your cloud provider's network). You pay twice: once for the CDN's egress delivering the response to the user, and once for the origin's egress delivering the resource to the CDN edge. If your CDN cache hit ratio is only 40%, then 60% of all traffic incurs double egress cost. For AWS users this is especially visible: CloudFront has no data transfer cost for requests from CloudFront to S3 or EC2 in the same region — but a CDN miss that pulls from an EC2 origin in a different region incurs EC2 egress charges on top of CloudFront charges.

The chart makes it visual: hit ratio is not linear in its impact on cost. Going from 60% to 90% hit ratio reduces origin load from 40% to 10% — a 4× reduction. Going from 90% to 99% reduces it from 10% to 1% — another 10× reduction. The majority of CDN cost optimization effort should focus on identifying why cache hit ratio is low (misconfigured Cache-Control, query parameter pollution in cache keys, excessive Vary headers) and fixing it. Each percentage point of hit ratio improvement reduces both latency and bill.

CDN billing has three dimensions: bandwidth egress, request count, and edge compute time. Cloudflare's flat-monthly model suits most web workloads; CloudFront's per-GB plus per-request model suits AWS-native architectures; Fastly's usage-based model suits high-throughput streaming; budget alternatives like Bunny.net offer very low per-GB rates for static asset delivery. Cache misses incur double egress (CDN + origin), so maximizing hit ratio is the single most impactful cost optimization available. At 99% hit ratio, origin handles 1% of traffic — the CDN bill is a fraction of what it would cost to serve all traffic from origin compute.

Section 12

CDN Architecture — How a Request Actually Flows

You've seen what CDNs do at a high level. Now let's trace a single HTTP request from the moment a user types a URL to the moment pixels appear on screen — microsecond by microsecond. Understanding this flow is what separates engineers who configure CDNs competently from engineers who diagnose CDN problems quickly.

The Full Request Path — Cache HIT

The happy path: the content is already cached at the nearest edge node. Here is the complete sequence:

DNS resolution — the user's browser resolves cdn.example.com. The authoritative DNS server returns either an Anycast IP (same IP globally, BGP routes to nearest PoP) or a GeoDNS response (different IP per geographic region). This takes ~10–50 ms on first lookup, near-zero from local DNS cache on repeat visits.
TCP + TLS handshake to the nearest edge — the browser opens a TCP connection to the resolved IP. Because it's Anycast/GeoDNS, this hits the nearest CDN PoP, not the origin. For a Sydney user, this might be 10 ms. TLS 1.3 adds one more round trip (~20 ms). Total handshake: ~30–40 ms.
HTTP request arrives at edge node — the edge receives the request. It hashes the cache key (URL, possibly normalized query params, possibly Accept-Encoding variant) and looks up the result in its local memory and SSD cache. Modern CDN edges hold gigabytes of cache per node in a tiered storage hierarchy: hot content in DRAM, warm content in NVMe SSD.
Cache HIT: respond immediately — if the cache entry exists and is within TTL, the edge returns it directly. Time from request arriving at edge to response sent: single-digit milliseconds for small objects in DRAM cache. The origin is never contacted. Total time-to-first-byte for the Sydney user: ~40–60 ms.

The Full Request Path — Cache MISS with Shield Tier

The less-happy but still-good path: the content isn't in the nearest edge's cache. Without a shield tier, the edge goes directly to the origin. With a shield tier, there's an intermediate step — and it's a powerful one.

A shield tier (also called an "origin shield") is a second, smaller layer of CDN nodes that sits between edge nodes and the origin. Instead of every edge PoP going independently to the origin on a miss, they all forward misses to the shield tier first. The shield has a larger, more persistent cache. If the shield has the content, it returns it and the origin is spared. If the shield also misses, only the shield fetches from origin — not every edge PoP simultaneously. Cloudflare calls this Argo Tiered Caching; Fastly calls it Origin Shield; AWS CloudFront has a feature also called Origin Shield.

The shield tier multiplies the origin-offload effect dramatically. Here's the arithmetic the diagram summarizes: suppose your CDN has 300 edge PoPs, and your edge cache hit ratio is 90%. Without a shield tier, 10% of all requests escape to the origin — and each of those 300 edges independently fetches from origin on a miss, meaning the origin potentially handles up to 30 simultaneous fetches for the same popular resource during a traffic burst. With a two-tier cache, those 300 edge misses collapse into a handful of shield requests. If the shield hit ratio is 80% (of misses), the origin sees only 2% of total traffic instead of 10%. One line of CDN configuration — enable Origin Shield — achieves a 5× reduction in origin load.

Putting It All Together: A Complete First-Visit Flow

Here is the complete sequence for a first visit to a page that isn't cached anywhere yet:

User opens browser → types URL → presses Enter. Browser checks local DNS cache (empty on first visit), asks OS resolver, which asks recursive resolver, which asks authoritative DNS. Returns Anycast IP. ~50 ms.
Browser opens TCP connection to Anycast IP. BGP routing in the internet's backbone steers the SYN packet to the nearest CDN PoP. TCP SYN-ACK round trip: ~10–20 ms for a nearby user.
TLS 1.3 handshake. One additional round trip, same ~10–20 ms. CDN edge presents TLS certificate (CDN manages certificates for your domain automatically).
Browser sends HTTP GET. Edge receives it, computes cache key, checks local DRAM and SSD cache. MISS — content not cached at this edge yet.
Edge forwards to shield tier (if configured). Shield checks its larger, more persistent cache. MISS — also not cached (first ever request).
Shield sends GET to origin. Origin receives the request, renders the page or reads from its database, and returns the full response with Cache-Control: public, s-maxage=300, stale-while-revalidate=60. This leg might take 50–200 ms depending on origin complexity.
Shield receives response, stores a copy keyed by URL + Vary dimensions. Forwards response to edge.
Edge receives response, stores a copy in its local cache (DRAM for hot content). Forwards to browser.
Browser receives and renders. Total time-to-first-byte: ~200–400 ms for first visit, depending on distance and origin response time.
Second request to same edge (any user worldwide within TTL): steps 1–4 only, cache HIT at step 4. Total TTFB: ~40–60 ms. No origin contact.

The 1-line config that 5×es your origin capacity: Enable "Origin Shield" or "Tiered Caching" in your CDN dashboard (one checkbox or one API flag in Cloudflare, Fastly, and CloudFront). On a site with 300 edge PoPs and 90% edge hit ratio, this collapses up to 300 simultaneous origin fetches per cache-miss event into roughly 1. No code changes. No infrastructure scaling. Just cache topology reconfiguration.

A CDN request flows through DNS resolution (Anycast/GeoDNS → nearest PoP), TCP+TLS handshake to the local edge (~30 ms), cache key lookup, and either a HIT (response in 5 ms from edge DRAM) or a MISS (forwarded to shield tier, then to origin if needed). The two-tier architecture — edge nodes backed by an origin shield — is the structural reason CDNs can achieve 98%+ origin offload. Enabling origin shield is typically the single highest-leverage CDN configuration change available: it multiplies origin-offload without code changes. After the first miss, every subsequent edge hit serves users at local PoP latency (~10–50 ms) regardless of origin location.

Section 13

Static Asset Patterns — Immutable URLs, Asset Hashing, Long TTLs

Here's the most reliable CDN pattern in existence: never change a URL — change the URL itself. It sounds like a riddle, but it's behind almost every high-performance web app you've used. Let's unpack what that means and why it works so well.

The Problem with Mutable URLs

Imagine you have a JavaScript file at https://example.com/static/app.js. You want the CDN to cache it for a long time so users don't re-download it on every visit. You set Cache-Control: max-age=86400 (one day). But now you deploy a bug fix. The URL is the same — app.js — but the content has changed. Every user whose browser (or the CDN edge) cached the old version will serve the broken code for up to 24 hours. You have two bad options: either set a short TTL (constant re-downloading, killing your CDN hit ratio) or set a long TTL and accept stale content for hours after deploys.

This is the classic cache invalidation problem — the notorious "one of the two hard things in computer science." The elegant solution modern build tools use sidesteps it entirely by making the URL encode the content itself.

Content Hashing: Baking the Fingerprint Into the Filename

Modern build tools like Webpack, Vite, and Next.js compute a content hash — a short cryptographic fingerprint of the file's bytes — and embed it in the filename at build time. The result looks like:

app.js → dist/app.a3f7c291.js styles.css → dist/styles.9e4b12d0.css logo.png → dist/logo.d8c4e7fa.png

The hash (a3f7c291, 9e4b12d0, etc.) is derived directly from the file content via a fast hashing algorithm like MD5 or SHA. If a single byte of app.js changes, the hash changes completely (avalanche effect), producing a new filename. If the content is identical across deploys — for example, a library you haven't touched — the hash stays the same and the CDN re-uses the existing cache entry.

Now the magic: because the filename IS the content, you can set a permanent cache TTL with no risk of serving stale files. If the file changes, it gets a new URL — a fresh cache miss — and the new version is fetched. If the URL is already in cache, you know with mathematical certainty the cached version is correct.

# These assets will never change at this URL — cache them forever location ~* \.(js|css|png|woff2)$ { add_header Cache-Control "public, max-age=31536000, immutable"; # max-age=31536000 = 1 year in seconds # immutable = "I promise this URL's bytes will never change" }

The immutable directive tells browsers and CDN edges: don't bother checking for updates on this URL even when the max-age hasn't expired yet. Some browsers (Firefox, Chrome) honour this by skipping the conditional revalidation request entirely on back-navigation — a real speed boost for returning users.

The diagram above shows the full flow. Source files enter the build tool, which computes a content hash and bakes it into each output filename. The CDN caches these with a one-year TTL. On redeploy, the old hashed URLs sit quietly in cache until they naturally expire (browsers won't request them because the HTML now references new filenames). The new hashed URLs are fetched once — a cache miss — then cached for another year. The origin handles exactly one request per unique asset per CDN edge, no matter how many millions of users hit the page.

The Critical Exception: HTML Has a Short TTL

There's one file that can't be content-hashed: your HTML pages. HTML is the index that references all those hashed asset URLs. If you cache HTML for a year and then deploy new assets, users will fetch the old HTML — which points to the old hashed filenames — and your perfectly valid new assets will never be seen. HTML must have a short TTL so browsers and CDN edges revalidate it quickly after each deploy.

This two-tier strategy is the foundation of every serious CDN deployment for web assets. HTML gets a short TTL (60 seconds to a few minutes, plus stale-while-revalidate so returning users don't wait for the revalidation). Hashed assets get a one-year immutable TTL. The practical result: after a deploy, a user might see the old HTML for up to 60 seconds — but once their browser fetches the new HTML, all new asset URLs are cold cache misses and they get the fresh code immediately. Meanwhile, unchanged assets are served from the CDN's cache without touching origin at all.

Hit ratios in practice: With content-hashed assets and a one-year TTL, production CDN deployments routinely achieve 98–99.5% byte-hit ratios on static assets. Your origin bandwidth bill can drop by 100× on the first deploy. The only traffic hitting origin is: (1) HTML revalidation requests every 60 seconds per edge (small), and (2) new asset filenames on each deploy (small). Everything else is served entirely from CDN cache.

Content hashing solves cache invalidation by making the filename encode the content — if bytes change, the URL changes, automatically forcing a cache miss. Set max-age=31536000, immutable on hashed assets for a one-year permanent TTL with 99%+ hit ratios. The HTML page must use a short TTL (60s–5min) because it's the directory pointing to those hashed URLs — old HTML pointing to old hashed filenames is the one failure mode to watch for.

Section 14

Video Delivery — HLS, DASH, and Byte-Range Caching

Video is the heaviest workload a CDN handles. A single user watching a 1080p stream consumes 5–8 Mbps continuously for as long as they watch. Multiply that by millions of concurrent viewers during a live event and you're moving petabytes per second globally. The CDN techniques for video are fundamentally different from static file caching — and understanding them explains why modern streaming services can scale to any audience size.

The Core Idea: Chop the Video Into Tiny Pieces

Serving a 2-hour movie as a single 6 GB file would be catastrophic for caching. Every edge would have to store the whole thing. Users who seek to the middle would have to download from the beginning. Bandwidth would be wasted on content the user skips. Instead, two dominant streaming protocols exist — both built on the same insight: split the video into short segments and serve each one as a regular HTTP request.

Apple's flavor of this splitting trick is HLS (HTTP Live Streaming). It chops video into segments of 2–10 seconds each (stored as .ts files in a format called MPEG Transport Stream) and creates a small text file called a manifest — basically a playlist (.m3u8) that lists the segments in order. The browser or mobile app downloads the manifest first, then fetches segments one by one, staying a few segments ahead of playback so the user never sees a buffering pause. HLS is natively supported on iOS and Safari and is the protocol behind almost all mobile streaming in the US.

The vendor-neutral version of the same idea is DASH (Dynamic Adaptive Streaming over HTTP) — an ISO/IEC standard. DASH works identically in concept but uses .m4s segments (fragmented MP4 format) and an .mpd manifest. Because it isn't owned by any single company, YouTube, Netflix, and most non-Apple platforms standardized on it.

Both protocols produce the same architecture at the CDN level: thousands of small HTTP requests, each cacheable independently. This is the key insight that makes CDN video delivery work.

#EXTM3U #EXT-X-VERSION:3 #EXT-X-TARGETDURATION:6 #EXT-X-MEDIA-SEQUENCE:0 #EXTINF:6.006, segment_0000.ts ← 6-second chunk, ~4.5 MB at 1080p #EXTINF:6.006, segment_0001.ts #EXTINF:6.006, segment_0002.ts ... #EXTINF:5.980, segment_1439.ts ← last segment of a 2.4-hour film #EXT-X-ENDLIST ← marks VOD (video on demand); absent for live streams

This .m3u8 manifest for a 2.4-hour movie lists 1,440 segments (at 6 seconds each). Each segment is a cacheable HTTP object. Once a segment is cached on a CDN edge, every user who watches that part of the video from that edge gets it from cache — zero origin requests after the first viewer.

The diagram shows the full pipeline. A video encoder produces multiple ABR (Adaptive Bitrate) variants of each segment — the same 6-second clip encoded at 1080p, 720p, 480p, and 240p. All variants are pushed to origin storage. The CDN caches each segment independently. The player fetches the manifest, measures its download speed for recent segments, and requests whichever bitrate it thinks its connection can handle without buffering. If you're on a fast Wi-Fi connection, you get 1080p. If you duck into a tunnel and bandwidth drops, the player smoothly steps down to 480p for a few segments, then steps back up when you emerge. No buffering, seamless transitions.

Byte-Range Requests: Seeking Without Re-Downloading

Even with segments, users sometimes seek to a specific timestamp inside a segment. HTTP byte-range requests — the Range: bytes=X-Y header — let a browser ask for only a portion of a file. CDNs support this natively: if the full segment is already in cache, the CDN slices out the requested byte range locally without asking origin. If the segment isn't cached yet, the CDN fetches it from origin and simultaneously stores the full segment in cache for future requests.

Live streaming vs VOD: For video-on-demand, segments are immutable once written — the CDN can cache them with long TTLs. For live streaming, the manifest (.m3u8 / .mpd) must have a very short TTL (2–6 seconds) because it's updated every few seconds with new segment filenames. New segments themselves are immutable once they exist — cache them long. This distinction is critical: if you accidentally cache a live manifest too long, viewers see a stalled stream.

The Scale Numbers

A popular livestream event can fan out to millions of concurrent viewers within minutes. With a traditional origin-per-viewer model, you'd need millions of simultaneous connections to a single server — impossible. With CDN-based HLS delivery, origin ingests one stream and pushes it to the CDN. The CDN's edge nodes replicate segments locally. When a million viewers in Tokyo all request the same segment, the Tokyo PoP serves it from its local cache — origin sees at most one request per segment per PoP. If there are 300 PoPs globally and 1,440 segments in a 2.4-hour event, origin handles at most 432,000 segment fetches total, regardless of viewer count. Viewer counts beyond that are essentially free from an origin perspective.

HLS and DASH solve video delivery by splitting files into thousands of short HTTP segments, each cacheable independently on CDN edges. Adaptive bitrate switching lets the player select quality based on real-time bandwidth — seamlessly stepping between 240p and 4K. Byte-range requests enable in-segment seeking from cache. Live manifests need short TTLs (2–6 s); segments are immutable once written and can be cached long. Origin load scales with the number of PoPs, not the number of viewers.

Section 15

API Acceleration & Dynamic Content

Most teams think CDN = static files. That's half the story — and the less interesting half. Modern CDNs also dramatically accelerate dynamic content: API responses, personalized HTML, search results, and even entire page renders. If you're leaving dynamic acceleration off, you're paying for the CDN's network but using only a fraction of its power. Let's look at three techniques.

Technique 1: Connection Reuse — Free Latency Savings

Every time a user makes an HTTPS request to your origin, their browser must open a TCP connection and negotiate a TLS handshake with the origin server. From the user's location in, say, Tokyo to your origin in Virginia, that's 2–3 round trips × 160 ms each = 320–480 ms just in setup overhead before a byte of response travels.

A CDN solves this because it already has persistent, pre-warmed TCP+TLS connections to your origin. The CDN maintains a pool of keep-alive connections — often hundreds — between its edge nodes and your origin. When your Tokyo user makes a request, the edge terminates the user's connection locally (10 ms round trip) and multiplexes the request through an existing connection to origin. The TCP + TLS setup overhead to origin happens once and is amortized across thousands of requests. The user pays 10 ms (edge RTT) instead of 480 ms (origin RTT).

The diagram shows the contrast. Without CDN, every user request from Tokyo to Virginia pays the 480 ms TCP+TLS setup cost. With CDN, the user pays 10 ms to reach the Tokyo edge; the edge uses its pre-established connection pool to Virginia. This technique accelerates every dynamic request — even ones the CDN can't cache — because it eliminates the connection setup overhead on the user's side. It's a free win that many teams don't realize they're getting.

Technique 2: Short-TTL API Caching — Even 5 Seconds Matters

A common misconception: "Our API returns user-specific data, so we can't cache it." That's true for responses that contain session-specific content. But many API endpoints return the same data for all users (or for all users in a region): product listings, search results, trending posts, public pricing, weather data. These can be cached at the CDN edge with a short TTL.

The math here is counterintuitive. If your product listing endpoint handles 10,000 requests per second and you set a 5-second CDN TTL, the CDN serves at most 2 requests per edge node per 10 seconds to origin (one to fill, one potential revalidation). That's roughly 12 origin requests per hour per edge — instead of 36 million. A 5-second TTL reduces origin load by 99.997%. The data is at most 5 seconds stale — entirely acceptable for most product catalogs.

Combine short TTL with stale-while-revalidate for APIs. Set Cache-Control: public, max-age=5, stale-while-revalidate=30. Users get a cache hit immediately (no waiting); the CDN revalidates in the background. Origin sees 1–2 requests per 35 seconds per edge instead of thousands per second. For endpoints that change infrequently, this is the highest-leverage cache header you can write.

Technique 3: Edge-Side Includes (ESI) — Assembling Pages at the Edge

Sometimes a page is 95% public content (cacheable for hours) but has a small dynamic fragment — a shopping cart count, a logged-in username, a personalized recommendation block. Traditionally, the whole page would be marked Cache-Control: private and served from origin for every user, because of that small dynamic slice. ESI (Edge Side Includes) solves this by letting the CDN assemble the page at the edge from separately cached fragments.

ESI markup is simple: you mark sections of an HTML template with <esi:include src="..."> tags. When the CDN edge processes the response, it fetches and inserts the named fragments — some from local cache, some from origin. Akamai has supported ESI for decades; Varnish and Fastly have native ESI support. The practical impact: pages that were previously uncacheable because of a small dynamic slice become 90–99% cacheable, with only the dynamic fragment touching origin.

Dynamic content acceleration uses three techniques. Connection reuse eliminates the 320–480 ms TCP/TLS overhead on every request by keeping pre-warmed persistent connections between CDN edges and your origin. Short-TTL API caching (5–30 seconds) can reduce origin request volume by 99%+ for public-ish endpoints — even 5 seconds of caching dramatically cuts load at scale. ESI composes pages from independently-cached fragments at the edge, making pages with small dynamic slices mostly cacheable. None of these require the response to be fully cacheable long-term.

Section 16

CDN Security — DDoS Mitigation, WAF, Bot Management, TLS Termination

A CDN is security infrastructure whether you treat it that way or not. When you route all your traffic through a CDN's global network, you're also routing all attack traffic through it — and those 300+ globally distributed edge nodes are simultaneously your biggest defence and your most important security boundary. Understanding how CDNs handle DDoS, malicious requests, bots, and TLS makes the difference between "CDN as a performance tool" and "CDN as a security layer."

DDoS Mitigation — Volume Absorbed by Distribution

Imagine someone wants to take your website offline. The simplest way is to flood it with so many fake requests that real users can't get through — like a mob blocking the door of a coffee shop so paying customers can't enter. When that mob is made of thousands of hijacked computers all around the world hitting your server at once, that's called a DDoS (Distributed Denial of Service) attack. The fundamental defence a CDN provides is structural: there's no single server to overwhelm. Instead of your one origin handling all traffic, attack traffic is absorbed across hundreds of edge nodes worldwide.

Cloudflare has published publicly that in February 2023, they mitigated what they described as the largest HTTP DDoS attack on record at the time, peaking at 71 million requests per second — a genuine astronomical number, coming from over 30,000 IP addresses across numerous cloud providers (subsequent attacks have since surpassed this, including a 201M rps record set in August 2023 and multi-Tbps network-layer attacks in 2025). The key reason they could absorb it is Anycast routing: when traffic arrives destined for Cloudflare's IP addresses, BGP routes it to the nearest PoP. The attack was automatically distributed across hundreds of PoPs worldwide; no single location received an overwhelming fraction. A server at a startup's data center would have been unreachable in seconds. Cloudflare's edge barely noticed.

WAF — Web Application Firewall

Beyond raw flood absorption, a CDN can also act as a smart bouncer at the door — reading each incoming request and turning away the ones that look like attacks before they reach your application code. That smart-bouncer layer is called a WAF (Web Application Firewall). It sits at the CDN edge and inspects every incoming HTTP request for malicious patterns. When a request arrives, the WAF checks it against rule sets — some provided by the CDN vendor, some custom. Rules look for things like:

SQL injection — query parameters like ?id=1' OR '1'='1 that attempt to manipulate backend database queries
XSS (Cross-Site Scripting) — inputs containing <script> tags or JavaScript event handlers designed to execute in another user's browser
Path traversal — requests like ../../../../etc/passwd trying to read files outside the web root
Known CVE patterns — signatures for specific known vulnerabilities (e.g., Log4Shell, Spring4Shell)
Rate limiting — a single IP making 1,000 requests per minute gets throttled or blocked

The critical advantage: the WAF runs at the CDN edge, before the request ever touches your origin. A SQL injection attempt is blocked in Tokyo, never reaching your database in Virginia. Cloudflare WAF, AWS WAF (bundled with CloudFront), Fastly's WAF, and Akamai Kona Site Defender all follow this pattern.

Bot Management

Not all automated traffic is malicious — search engine crawlers are bots you want. But credential stuffing bots, scrapers, scalper bots (buying limited-edition products), and ad fraud bots are expensive. CDN-based bot management distinguishes legitimate humans and good bots from malicious ones through a combination of:

JavaScript fingerprinting — serving a JS challenge to suspicious clients; real browsers execute it, simple HTTP clients don't
Behavioural analysis — humans move mouse, pause between clicks, navigate non-linearly; bots click in milliseconds with perfect precision
Reputation scoring — known bot IP ranges, Tor exit nodes, hosting provider IP blocks known for abuse
CAPTCHAs — as a last resort for ambiguous requests

TLS Termination at the Edge — and What Happens Behind It

When a user connects to your site over HTTPS, the TLS handshake terminates at the CDN edge, not at your origin. This means the user's encrypted session ends at the Tokyo edge node; the edge decrypts the request, inspects it (WAF, cache lookup), and either serves from cache or forwards to origin. The edge-to-origin connection is a separate TLS session — or, in some configurations, plain HTTP over a private network between the CDN and your origin. The user always has an encrypted connection; the internal CDN-to-origin path depends on your configuration.

Modern CDNs also support HTTP/3 + QUIC on the user-to-edge leg. QUIC is a UDP-based protocol designed to solve TCP's major latency problems — particularly the head-of-line blocking problem, where a lost packet stalls the entire stream. On mobile networks where packet loss is common, HTTP/3 can reduce perceived latency noticeably. Cloudflare, Fastly, and Akamai all support HTTP/3 at the edge; the edge-to-origin leg typically stays on HTTP/2 over TLS.

CDNs are inherently DDoS-resistant because attack traffic is diluted across hundreds of PoPs — no single location receives a crushing volume. WAF at the edge blocks SQL injection, XSS, and CVE exploits before they reach origin. Bot management uses JS challenges, behavioural analysis, and IP reputation to separate humans from automated abusers. TLS terminates at the edge (the user's encrypted session ends there); edge-to-origin uses a separate connection, often HTTP/2 over TLS or a private network path. HTTP/3 + QUIC at the edge improves mobile latency by eliminating TCP's head-of-line blocking.

Section 17

Common Pitfalls & Production Incidents

CDNs have some of the sneakiest failure modes in web infrastructure. Misconfiguring one header can silently reduce your cache hit ratio to zero. Getting one directive wrong can serve one user's session data to a completely different user — a GDPR-violating catastrophe. These are the seven most common production mistakes, each documented from real incident patterns.

The mistake: Setting Cache-Control: max-age=3600 and wondering why the CDN hit ratio is 0%. The max-age directive controls browser caching. CDNs (also called "shared caches" or "proxy caches") respect a separate directive: s-maxage. If s-maxage is absent and private is present, most CDNs will not cache the response at all — or will use max-age as a fallback (behaviour varies by vendor).

Why it bites: The CDN appears to be working (requests are routing through it, latency is lower), but it's acting as a pure proxy — forwarding every request to origin with no caching benefit. You're paying CDN bandwidth prices for zero hit ratio improvement.

The fix: Always explicitly set s-maxage for any response you want CDN-cached:

# Wrong — browser caches 1 hour, CDN may not cache at all Cache-Control: max-age=3600 # Correct — CDN caches 1 hour, browser caches 5 minutes Cache-Control: public, max-age=300, s-maxage=3600

The split lets you give browsers a short TTL (so users get updates within 5 minutes) while the CDN holds a longer cache (reducing origin load more aggressively).

The mistake: A response contains a Set-Cookie header (setting a session ID, A/B test bucket, or preference) AND a CDN-cacheable Cache-Control directive. Most CDNs will cache the response including the cookie. The next user who makes the same request gets the same response — including the cookie that was set for the first user. That user's browser now holds a session cookie that belongs to someone else.

Why it bites: This is a catastrophic security and privacy incident. Depending on what the cookie contains, it could expose authentication tokens, personal data, or account access. At minimum it's a GDPR violation. It's also hard to detect from monitoring — CDN hit ratios look great, but something is deeply wrong.

The fix: Either strip cookies before caching (most CDNs have cookie-stripping rules in their config), or ensure that any response containing a Set-Cookie header also sets Cache-Control: private, no-store. Better yet: serve unauthenticated/public responses from a separate origin path that never sets cookies.

# Dangerous — CDN will cache this including the Set-Cookie header HTTP/1.1 200 OK Cache-Control: public, max-age=300 Set-Cookie: session_id=abc123; Secure; HttpOnly # Safe approach — origin sets no-store for any response touching cookies Cache-Control: private, no-store

The mistake: Adding Cache-Control: private to responses that have no personalisation — public JS bundles, images, fonts, public API responses. private explicitly instructs shared caches (including CDN edges) not to store the response. Often this happens because a developer copies a header from a session-handling endpoint and applies it globally, or because a framework sets private as the default for all responses.

Why it bites: Every single request is a cache miss. The CDN forwards 100% of requests to origin. You get the CDN's latency improvement (connection reuse, geographic routing) but zero offload benefit. Your origin receives full production traffic and scales accordingly — you're paying for both CDN bandwidth and full origin capacity unnecessarily.

The fix: Audit your response headers. Separate responses into three buckets: (1) fully public — use public, s-maxage=...; (2) private/user-specific — use private, no-store; (3) mixed — strip personalisation from the cacheable part or use ESI to compose them.

The mistake: Marketing teams add UTM tracking parameters to URLs (?utm_source=google&utm_campaign=spring2024). Analytics tools add ?gclid=... (Google Click ID) or ?fbclid=... (Facebook Click ID). These parameters are meaningless to your server — the same HTML page is returned regardless. But to the CDN, /products?utm_source=google and /products?utm_source=email are different cache keys — two separate cached objects.

Why it bites: With even 10 common UTM parameter combinations, your cache key space explodes by 10×. A page that would have had a 90% hit ratio now has 9% (10 different entries, each individually cold most of the time). At scale, this can turn a well-configured CDN into an effectively uncached origin proxy.

The fix: Configure your CDN to normalize the cache key by stripping known tracking parameters. Cloudflare, CloudFront, and Fastly all support cache key manipulation rules:

# Cloudflare Cache Rules — strip UTM/gclid/fbclid from cache key Cache Key: strip query params matching: utm_*, gclid, fbclid, msclkid # The page still receives the full URL (for analytics) # but the CACHE LOOKUP uses the stripped URL # Result: all UTM variants share one cache entry

The mistake: You deploy a fix for a critical bug. Your CDN edge nodes still have the old response cached with a 24-hour TTL. Until that TTL expires — up to 24 hours later — every user in every geography sees the buggy version. You didn't configure an automated purge in your deploy pipeline.

Why it bites: CDN caching is invisible in normal operation — it feels like the server is just "responding normally." When a deploy goes out, developers intuitively expect users to see the new version immediately. Without explicit purging, they won't. Critical bug fixes, security patches, and content corrections all require proactive cache invalidation to take effect quickly.

The fix: Integrate cache purges into your CI/CD pipeline. Every production deploy should trigger a targeted purge via the CDN API. For HTML pages (which can't be content-hashed), purge by URL path or surrogate key. For hashed assets, purging isn't needed — the new hash creates a new URL automatically. Also: keep HTML TTLs short (60 s–5 min) as a safety net in case purge automation fails.

The mistake: Adding Vary: User-Agent to responses to serve different content to mobile vs desktop. Or Vary: Accept-Encoding, Accept-Language, User-Agent. The Vary header tells the CDN: "a different version of this response exists for each unique value of these request headers." User-Agent has thousands of distinct values. Accept-Language has hundreds. The CDN must store one cache entry per combination.

Why it bites: Cache hit ratios collapse toward zero because each request likely has a unique combination of User-Agent + Accept-Language values. This also causes cache poisoning risk: a malicious attacker can deliberately send unusual header values to force cache misses and drive origin load.

The fix: Never use Vary: User-Agent. Serve a responsive design that works for all devices, or use JavaScript to load device-specific resources client-side. For compression, Vary: Accept-Encoding is standard and CDNs handle it well (they collapse it to a small number of encoding variants). For internationalisation, use different URL paths per locale (/fr/, /de/) rather than Vary: Accept-Language.

# Catastrophic — creates thousands of cache variants Vary: User-Agent, Accept-Language # Acceptable — CDN handles 2-3 encoding variants (gzip, br, identity) Vary: Accept-Encoding # Better for i18n — use path-based routing, no Vary needed /en/products/ → Cache-Control: public, s-maxage=3600 /fr/products/ → Cache-Control: public, s-maxage=3600

The mistake: Your origin server has a bug that causes it to return HTTP 500 (Internal Server Error) for a few minutes during a deploy. The problematic response happens to have a Cache-Control: max-age=3600 header (perhaps set globally in your framework, not intended to apply to error responses). The CDN dutifully caches the 500 response. For the next hour, every user hitting that URL gets a cached error — even after your origin is healthy again and returning 200s.

Why it bites: The incident is over at the origin, but CDN edges keep serving the cached error for the full TTL duration. Support tickets flood in, engineers are confused why origin logs show no errors, and it takes a manual purge (which someone has to know to do) to clear the cached 500.

The fix: Configure your CDN to never cache 4xx/5xx responses, or to cache them only for a very short duration (5–30 seconds maximum) to prevent thundering herd during transient outages. Most CDNs have error caching settings distinct from success caching. Also: enable stale-if-error — this tells the CDN to serve a stale cached success response during origin errors rather than surfacing the error to users:

# Serve stale content for up to 1 day if origin returns an error Cache-Control: public, max-age=3600, stale-if-error=86400 # CDN vendor config: never cache 4xx/5xx # Cloudflare: Cache Rules → "Cache Status: BYPASS for response status 4xx/5xx"

The seven most common CDN pitfalls: (1) missing s-maxage leaves the CDN uncaching; (2) caching Set-Cookie responses can serve one user's session to another; (3) global private headers kill hit ratios on public assets; (4) UTM/tracking query params explode the cache keyspace by 10–100×; (5) missing deploy purges serve old content for hours post-deploy; (6) Vary: User-Agent creates thousands of cache variants per URL; (7) caching error responses leaves users stuck on cached failures after origin recovers. Each pitfall is fixable with the right header or CDN rule once you know to look for it.

Section 18

Practice Exercises — Build Your Intuition

CDN knowledge only becomes intuition through practice. These five exercises are designed to move you from "I read about it" to "I can reason about it from first principles." Work through each one before expanding the solution — the struggle is where the learning happens.

A user is located in São Paulo, Brazil. Your origin server is in Frankfurt, Germany. The straight-line geographic distance is approximately 9,500 km. Fiber routing adds roughly 30% to this (cables don't go in straight lines across oceans).

Calculate the minimum one-way propagation delay from São Paulo to Frankfurt.
Calculate the minimum round-trip time (RTT).
A TLS 1.3 page load requires 3 round trips (TCP handshake + TLS handshake + HTTP request) before the first byte arrives. What is the minimum TTFB from propagation delay alone?
If a CDN has a PoP in São Paulo that is 8 ms away from the user, how much does TTFB improve?

Light travels at ~200,000 km/s in fiber (two-thirds of the speed of light in vacuum). Multiply the fiber distance by 1.3 to get the routing distance, then divide by fiber speed to get one-way delay.

Step by step:

Fiber distance: 9,500 km × 1.3 = 12,350 km
One-way propagation: 12,350 km ÷ 200,000 km/s = 61.75 ms ≈ 62 ms
Round-trip time (RTT): 62 ms × 2 = 124 ms
TTFB (3 RTTs): 124 ms × 3 = 372 ms — just for propagation, before any processing
With CDN PoP at 8 ms: TTFB = 8 ms × 2 × 3 = 48 ms — a 324 ms improvement (87% reduction)

Takeaway: Geography is a hard floor. The CDN's 8 ms edge moves that floor from 372 ms to 48 ms. Server performance improvements below 372 ms are invisible without a CDN.

Design appropriate Cache-Control headers for each of the following response types. Think about: who can cache it (browser only? CDN too?), how long it stays fresh, what happens when it's stale, and whether it's user-specific.

A JavaScript bundle with a content-hashed filename (app.a3f7c291.js)
A logged-in user's profile page (/account/settings)
A public marketing homepage (/) that changes only on deploys (roughly once per week)
A product listing API endpoint (/api/products) returning the same catalog for all users, updated every 10 minutes

For (1): think immutability. For (2): think privacy. For (3): think about deploy frequency and purge strategy. For (4): think about short-TTL caching + stale-while-revalidate math.

# (1) Content-hashed JS bundle — cache forever, it's mathematically immutable Cache-Control: public, max-age=31536000, immutable # Reasoning: filename changes on content change → URL = content fingerprint # No need to ever revalidate; new content = new URL = new cache entry # (2) User profile page — private, never cache at CDN Cache-Control: private, no-store # Reasoning: contains personal data; CDN caching would risk cross-user leaks; # no-store prevents even browser disk caching (use no-cache if you want memory cache) # (3) Public homepage — short CDN TTL + stale-while-revalidate safety net Cache-Control: public, max-age=60, s-maxage=300, stale-while-revalidate=3600 # Reasoning: max-age=60 means browsers recheck every minute (catches deploys quickly); # s-maxage=300 means CDN holds for 5 min (reduces origin load); # stale-while-revalidate=3600 means users never wait for revalidation — served stale instantly # Pair with deploy pipeline purge for immediate invalidation after deploys # (4) Public product listing API — short TTL to absorb traffic spikes Cache-Control: public, max-age=30, s-maxage=30, stale-while-revalidate=120 # Reasoning: 30-second TTL with 10K req/s = origin sees ~2 req/30s per edge = ~6/min per edge # At 50 global PoPs: ~300 origin requests/min instead of 600,000 req/min — 2000× reduction # stale-while-revalidate=120 means users always get instant responses; revalidation is background

You're debugging why a CDN's hit ratio for the /search page is only 2%, even though the page looks the same for every user. Examine these URLs from your CDN access logs and identify the problem:

/search?q=sneakers&utm_source=google&utm_medium=cpc&utm_campaign=spring2024 /search?q=sneakers&utm_source=email&utm_medium=newsletter&utm_campaign=spring2024 /search?q=sneakers&utm_source=facebook&utm_medium=social&utm_campaign=spring2024 /search?q=sneakers&gclid=CjwKCAiA...randomId /search?q=sneakers&fbclid=IwAR3...randomId /search?q=sneakers

All these URLs return the identical HTML. What's causing the 2% hit ratio? What's the fix?

Count how many distinct cache entries the CDN is creating for the same logical page. What do UTM parameters and click IDs do to the cache key?

The problem: Each URL with different UTM parameter values or a unique gclid/fbclid creates a separate cache entry at the CDN. Since gclid and fbclid values are unique per ad click (they're tracking IDs), every single ad click creates a brand new cache entry — effectively 0% hit ratio for ad traffic. UTM parameters create dozens of variants per URL (source × medium × campaign combinations). The fix: Configure CDN cache key normalization to strip known tracking parameters before the cache lookup. The full URL (including params) is still forwarded to origin for analytics, but the cache key uses the stripped version: # Cache key lookup uses: /search?q=sneakers # Origin receives: /search?q=sneakers&utm_source=google&... # Result: all UTM/gclid/fbclid variants share ONE cache entry # Expected hit ratio improvement: 2% → ~80%+

You're designing the CDN invalidation strategy for a large e-commerce site. The site has these resource types with these update patterns:

JS/CSS bundles — rebuilt with new content hashes on every deploy
Product detail HTML pages (/product/12345) — served from CDN with a 10-minute TTL; when a product's price or stock status changes, the page must update within 2 minutes
Homepage (/) — updated manually by the marketing team up to 5 times per day; must show new content within 5 minutes
Static images (/img/product-12345.jpg) — change rarely; when updated they're uploaded with a new filename

Design an invalidation strategy for each resource type. What mechanism, what TTL, and what triggers the invalidation?

Think about which resources need explicit purging vs which can rely on URL-based cache busting. Consider what events trigger each type of change.

Resource-by-resource strategy:

JS/CSS bundles: No invalidation needed. Content hashing means each deploy produces new filenames. Old URLs expire naturally from CDN cache over the following year (browsers no longer request them once HTML points to new filenames). TTL: max-age=31536000, immutable.
Product detail pages: Use surrogate key (cache tag) invalidation. Each product page response includes a header like Cache-Tag: product-12345. When product 12345's price or stock changes (an event in the e-commerce backend), the deploy pipeline or backend sends a purge-by-tag API call to the CDN. All edges holding the tagged response drop it immediately. TTL: s-maxage=600 (10 min as a backstop; surrogate key purge handles the 2-minute SLA). Cloudflare Cache Tags, Fastly surrogate keys, and Akamai cache tags all support this pattern.
Homepage: A short TTL (s-maxage=60, stale-while-revalidate=300) handles most cases within 1 minute. For the 5-minute SLA on manual updates: trigger a URL-based purge via CDN API when the marketing team publishes a change (a CMS webhook calling the CDN purge endpoint). This ensures the new content appears within seconds of publish, regardless of TTL.
Static images: Same as JS/CSS bundles — new filename on update means new cache entry automatically. No invalidation mechanism needed. TTL: max-age=31536000, immutable.

The pattern: Immutable files → URL-based cache busting (no purge). Mutable shared content → surrogate keys for surgical invalidation. Rarely-updated public pages → short TTL + CMS-triggered URL purge. User-specific content → never cached.

Your company is planning CDN costs for the next year. You have these numbers:

Total traffic volume: 100 TB per month served to end users
CDN cache hit ratio: 95% (5% of requests are cache misses, fetching from origin)
CDN bandwidth cost: $0.085 per GB (all traffic served by CDN edges to users)
Origin egress cost: $0.09 per GB (AWS data transfer out from your EC2/S3 origin)

Calculate: (a) monthly CDN bandwidth bill, (b) monthly origin egress bill, (c) total monthly CDN + origin bill, (d) what the origin egress bill would have been without a CDN, (e) the monthly saving from using the CDN.

95% hit ratio means 5% of 100 TB flows from origin → CDN edge. The CDN pays egress on 100 TB to end users regardless of hit ratio. Origin only pays egress on cache misses.

Step-by-step calculation:

Total traffic: 100 TB = 100,000 GB
(a) CDN bandwidth bill: 100,000 GB × $0.085 = $8,500/month
Origin traffic (cache misses only): 100,000 GB × 5% = 5,000 GB origin egress
(b) Origin egress bill (with CDN): 5,000 GB × $0.09 = $450/month
(c) Total monthly bill: $8,500 + $450 = $8,950/month
(d) Origin egress without CDN: 100,000 GB × $0.09 = $9,000/month — plus massive EC2 costs to handle 20× more load
(e) Direct bandwidth saving: $9,000 − $8,950 = $50/month in raw bandwidth (nearly break-even on bandwidth alone — the savings come from origin compute scaling, not bandwidth)

The deeper insight: At $0.085/GB CDN vs $0.09/GB origin egress, the CDN costs slightly less per GB even before hit ratio savings. But the real savings aren't on bandwidth — they're on origin infrastructure. Without the CDN, your origin handles 100 TB of traffic, requiring much larger (and more expensive) compute/database capacity. With the CDN absorbing 95%, your origin handles 5 TB — potentially reducing your server bill by 10×. For a server running at $5,000/month at full load, CDN-assisted scaling could bring that to $500/month, saving $4,500/month in compute on top of the bandwidth calculation.

These five exercises cover the four core CDN skills: (1) computing the physics floor for latency to understand what a CDN can and cannot improve; (2) designing Cache-Control headers for four real content types; (3) diagnosing a cache key explosion from UTM parameters; (4) building a per-resource-type invalidation strategy using content hashing, surrogate keys, short TTLs, and CMS webhooks; (5) modelling CDN costs to see that the primary saving is compute scaling, not bandwidth arbitrage.

Section 19

Bug Studies — When CDNs Go Wrong in Production

Theory is one thing. Production incidents are another. The four bugs below are drawn from real patterns that teams have hit: a session leak that exposed user A's data to user B, a Vary header that shattered cache efficiency, an accidental error caching that took a site offline globally for an hour, and a deploy-triggered thundering herd that killed the origin. Each one is an easy mistake to make — and each one is completely preventable once you know the pattern.

Bug 1 — The Dual-Cookie Disaster: Caching Authenticated Sessions

Incident: A retail site cached its product-page HTML at the CDN edge. The HTML was mostly static — product name, price, description. But the backend injected the user's cart count and display name into the HTML as a small snippet at the top. When user A loaded a product page and the CDN cached the response, user B in the same region got user A's name and cart count. Customer support received hundreds of reports of "I see someone else's account." The root cause: no Vary: Cookie header and no private directive on the Cache-Control.

What Went Wrong

The backend returned Cache-Control: max-age=300 with no qualifiers. From the CDN's perspective, this means "this response is publicly cacheable for 5 minutes." The CDN dutifully cached the first response it received — which happened to include user A's injected data — and served it to every subsequent visitor for the next 5 minutes. The page looked static but wasn't. The key lesson: any response that contains user-specific data must either be marked private (never cache at a shared proxy) or use a Vary: Cookie header so the CDN creates a separate cache entry per session token. In practice, Vary: Cookie effectively disables caching because every session cookie is unique — so the real fix is to strip user-specific data from the cached HTML and load it client-side via a separate authenticated API call.

The diagram above shows the failure path: User A's authenticated response gets cached at the edge. When User B arrives with a completely different session cookie, the CDN sees a cache HIT on the same URL and serves User A's data verbatim. No fraud occurred — the CDN followed the HTTP spec precisely — but the server gave it wrong instructions.

# Origin server response headers — the bug HTTP/1.1 200 OK Content-Type: text/html Cache-Control: max-age=300 # Missing: private, no-store, or Vary: Cookie # The CDN sees "public, cacheable for 5 minutes" and stores it  <div class="user-greeting">Hello, {{user.displayName}}! Cart: {{user.cartCount}} items</div>

# Option A: mark the whole page as private (never cached at CDN) HTTP/1.1 200 OK Content-Type: text/html Cache-Control: private, max-age=0 # Option B (better): strip user data from HTML, load it via a separate call HTTP/1.1 200 OK Content-Type: text/html Cache-Control: public, s-maxage=300, stale-while-revalidate=60 # HTML no longer contains user data — safe to cache # Companion endpoint (not cached — always authenticated): GET /api/me/cart → Cache-Control: private, no-store

Lesson: If there is even a single user-specific byte in an HTML response, the entire response must be marked private — or the user data must be removed from the cached HTML and fetched separately by JavaScript. There is no middle ground. CDNs follow the HTTP spec, not your intentions.

Bug 2 — The Vary: User-Agent Cache Fragmentation Bomb

Incident: A media company added Vary: User-Agent to their API responses so they could serve a slightly different JSON payload to mobile vs desktop clients. Within 24 hours, their CDN cache-hit ratio dropped from ~87% to roughly 5%. Costs tripled. Origin load spiked. The on-call engineer spent two hours assuming it was a traffic spike before realizing the CDN was now maintaining thousands of separate cache entries — one for every unique User-Agent string, of which Chrome alone has dozens of distinct versions sending subtly different UA strings.

What Went Wrong

The Vary header tells a CDN: "treat every unique value of this request header as a separate cached resource." For headers with a small number of distinct values — like Accept-Encoding: gzip vs Accept-Encoding: br — this is fine, because there are only two variants. But User-Agent is a disaster. Every browser version sends a different UA string. Chrome 122, Chrome 123, Chrome 123.0.6312.107 — all different. Safari on iOS 17.3 vs iOS 17.4 — different. A CDN edge with 10,000 users might see 8,000 unique UA strings, creating 8,000 separate cache shards for the same logical resource. Almost every request is a miss because the exact UA string has rarely been seen before.

The left side shows a healthy Vary: Accept-Encoding setup — two variants (gzip and Brotli), nearly every request is a hit. The right side shows the Vary: User-Agent catastrophe — thousands of unique User-Agent strings produce thousands of independent cache entries, most of which expire before they're ever reused.

# Origin returns different JSON for mobile vs desktop HTTP/1.1 200 OK Content-Type: application/json Cache-Control: public, s-maxage=120 Vary: User-Agent # Result: CDN creates a new cache entry for every unique UA string # Chrome 122, Chrome 123, Safari 17.3, Firefox 124... all separate shards

# Option A: normalize the UA at the CDN layer before hitting origin # Add a CDN rule that maps User-Agent → X-Device-Type: mobile|desktop|tablet # Then vary only on the normalized header: HTTP/1.1 200 OK Vary: X-Device-Type # only 2-3 variants — safe Cache-Control: public, s-maxage=120 # Option B (simplest): serve a single universal JSON, do device adaptation client-side # Remove Vary: User-Agent entirely — one cache entry per URL Cache-Control: public, s-maxage=120 # No Vary header (or only Vary: Accept-Encoding which is fine)

Lesson: Only Vary on headers with a small, bounded set of values. Accept-Encoding is safe (2-3 values). Accept-Language can be dangerous (hundreds of locales). User-Agent is always a disaster. If you need device-specific responses, normalize the signal into a custom header with 2-3 values first.

Bug 3 — Accidentally Caching a 500 Error for One Hour

Incident: During a deploy, a backend service threw a 500 for about 90 seconds while containers were restarting. The response included the application's default error template — which happened to return Cache-Control: max-age=3600 in the response headers (the developer had set a blanket header in middleware and never thought about error responses). The CDN dutifully cached the 500 error page and continued serving it globally for the next hour. The deploy finished in 90 seconds. The outage lasted 60 minutes — not because the origin was broken, but because the CDN was faithfully serving a stale disaster.

What Went Wrong

Most caching headers are set in framework middleware or reverse proxy config as a blanket rule: "cache everything for X seconds." That's fine for 200 responses. But HTTP error responses are still HTTP responses — they have bodies and headers, and CDNs will cache them too if the Cache-Control header says so. The fix is simple: configure your CDN to never cache 4xx or 5xx responses, or configure your application middleware to emit Cache-Control: no-store on any non-2xx response. Most CDNs also offer a "cache error responses for N seconds" override you can set to 0.

# Django middleware — blanket Cache-Control on every response class CacheControlMiddleware: def __call__(self, request): response = self.get_response(request) # BUG: applies to ALL responses including 500s response['Cache-Control'] = 'public, max-age=3600' return response

class CacheControlMiddleware: def __call__(self, request): response = self.get_response(request) if response.status_code >= 400: # Never cache error responses at the CDN response['Cache-Control'] = 'no-store' else: response['Cache-Control'] = 'public, s-maxage=3600' return response # Also configure at the CDN layer as a safety net: # CloudFront: set Error Caching Minimum TTL = 0 for 4xx/5xx # Cloudflare: Cache Rules → "Cache Status: BYPASS" when response code != 200

Lesson: Never apply blanket caching headers without filtering on HTTP status code. Add a CDN-level rule that sets TTL = 0 for all 4xx and 5xx responses as a belt-and-suspenders defense. Test this during your next deploy drill by intentionally returning a 503 and checking whether the CDN caches it.

Bug 4 — The Deploy Thundering Herd: Invalidating Everything at Once

Incident: An e-commerce site with ~50,000 cached objects at the CDN deployed a new version. Their deploy script called the CDN's purge API with a wildcard pattern to clear all assets. Within 2 seconds, every one of those 50,000 objects was evicted from every edge node globally. Incoming traffic — which had been comfortably served from the cache at 98% hit ratio — suddenly hit the origin for every single request simultaneously. The origin handled about 800 requests/second at peak cache-warmed load. The thundering herd produced 40,000 requests/second. The origin ran out of memory in 11 seconds and began OOM-killing application containers. The site was effectively down for 8 minutes.

What Went Wrong

A thundering herd — also called a cache stampede — happens when a large number of cache entries expire or are invalidated simultaneously, and all the traffic that was being served from cache suddenly floods the origin at once. The danger is proportional to your cache hit ratio: a 98% hit ratio means your origin is sized for 2% of traffic. If you invalidate everything simultaneously, it suddenly faces 50 times the load it was designed for. The fixes involve either staggering invalidations (purge in small batches with delays), using rolling deploys that warm the new cache before routing traffic, or using hash-based immutable filenames so you never need a global purge — old files stay cached and only new filenames are fetched.

The green line shows origin request rate. While the cache is warm, origin sees a tiny fraction of traffic. The moment the wildcard purge fires, every CDN edge node simultaneously starts forwarding requests — the spike towers above the origin's capacity ceiling (dashed yellow). OOM kills begin immediately.

# deploy.sh — purges everything at once npm run build aws s3 sync dist/ s3://my-bucket/ # Purge ALL assets from CloudFront simultaneously aws cloudfront create-invalidation \ --distribution-id EXXXXXXXXXXXXXX \ --paths "/*" # This instantly evicts every object from every edge node globally. # Origin now faces 100% of user traffic with zero cache warmth.

# Solution A: use content-hashed filenames — no purge needed at all # Build produces: main.8f3a2b1c.js, vendor.4d9e7f0a.js # Old filenames stay cached indefinitely, new filenames just get fetched fresh # Update index.html to reference new filenames (index.html has short TTL) aws cloudfront create-invalidation --paths "/index.html" "/service-worker.js" # Only 2 objects invalidated — no thundering herd # Solution B: stagger invalidation in batches (if you can't use hashed names) PATHS=( "/css/main.css" "/js/app.js" "/api/config.json" "...etc" ) for path in "${PATHS[@]}"; do aws cloudfront create-invalidation --paths "$path" sleep 2 # brief pause between each invalidation done # Solution C: blue-green deploy — warm the new origin fully before routing traffic # Switch CDN origin to the new backend only after a health check passes

Lesson: Wildcard cache purges on high-traffic sites are a loaded gun. The safest deploy strategy is content-hashed filenames — you never purge anything; you just reference new filenames in the HTML entry point, and only the HTML entry point needs a short TTL or a single targeted invalidation.

Four production CDN bugs follow the same meta-pattern: the CDN did exactly what HTTP headers told it to do, but the headers were wrong. Session leaks come from missing private directives. Cache fragmentation comes from high-cardinality Vary headers. Error caching comes from blanket middleware headers that don't filter on status code. Thundering herds come from invalidating large caches in a single burst. In every case, the CDN is innocent — it's the configuration that failed.

Section 20

Real-World CDN Architectures — How the Big Players Are Built

CDN products look similar from the outside — you CNAME your domain to their nameservers and traffic starts being served from the edge. But under the hood, five different companies made radically different architectural bets. Netflix built their own hardware and shipped it to ISPs. Cloudflare bet on Anycast and software-defined everything. Akamai's hierarchical design predates most of the other players. CloudFront leverages AWS's existing infrastructure. Fastly bet on instant purge as a competitive differentiator. Understanding these architectures tells you why each CDN is the right answer for different use cases.

Netflix Open Connect — The CDN That Moves Iron

Netflix's approach to CDN is unlike anything else: they build custom hardware appliances called Open Connect Appliances (OCAs) and physically ship them to Internet Service Providers (ISPs) who agree to install them inside their data centers. This is not a hosted service — it is Netflix-owned hardware running inside Comcast's, AT&T's, and hundreds of other ISPs' facilities worldwide.

The WHY is pure economics. Video streaming traffic is enormous — during peak evening hours, Netflix historically accounted for a substantial share of downstream internet traffic in North America. Every GB of video that Comcast carries from a Netflix data center in Virginia costs transit bandwidth money for both parties. If Netflix's hardware is physically inside Comcast's facility, that GB of video never crosses the internet — it travels from the OCA directly to the subscriber's home on Comcast's internal network. Transit cost: essentially zero. Latency: extremely low. For ISPs, the math is straightforward: install free Netflix hardware, reduce transit costs significantly.

Netflix pre-warms OCAs with popular content proactively. Each night, Netflix analyzes viewing predictions — what content will be popular tomorrow in each metro area — and pre-positions it on the local OCAs via an internal fill network. By the time viewers click play the next evening, the video is already sitting on the appliance 10 ms from their home router. At peak hours, a substantial majority of Netflix traffic is served directly from OCA hardware inside ISP facilities rather than from Netflix's own data centers.

The key insight: Netflix moves the content close to users before they request it. The OCA is pre-filled each night based on predicted demand; by the time users click play, the video is already on hardware inside the ISP's own building.

Cloudflare — Global Anycast and the Software-Defined CDN

Cloudflare's architecture centers on a single global Anycast IP address space. When you enable Cloudflare on your domain, your DNS records resolve to Cloudflare's IPs — and those exact same IP addresses are announced from 300+ cities simultaneously via Border Gateway Protocol. Every Cloudflare point of presence in the world advertises "I own this IP." When a user's request leaves their router, the internet's routing infrastructure automatically directs it to the nearest Cloudflare node — not by DNS lookup, but by BGP path selection in routers along the way.

Every Cloudflare PoP runs the same software stack — including Cloudflare Workers (V8 isolates), WAF rules, rate limiting, bot management, and cache. This means a request can be processed, filtered, rewritten, and responded to entirely at the edge without any involvement from your origin. HTTP/3 with QUIC is the default transport between users and Cloudflare edges, which meaningfully improves performance on high-latency or lossy mobile connections because QUIC's connection establishment is faster and packet loss doesn't stall all streams simultaneously (unlike TCP + HTTP/2).

The Anycast routing also provides automatic DDoS resilience: a volumetric attack from a botnet hitting a single Cloudflare IP is automatically distributed across all 300+ PoPs globally rather than hitting a single location. The attack is absorbed by the full weight of Cloudflare's network capacity instead of landing on your single origin.

Akamai — The Hierarchical Pioneer

Akamai invented the commercial CDN in 1998 and its architecture reflects a different era's constraints and a different design philosophy. Rather than Anycast (which requires BGP coordination at massive scale), Akamai uses GeoDNS: when a browser resolves your domain, Akamai's authoritative DNS servers inspect the geographic location of the resolver and return the IP address of the Akamai edge node nearest to that resolver. Each user gets a different IP in their DNS response, pointed to their nearest edge.

Akamai's topology is hierarchical: thousands of edge nodes in cities worldwide sit at the bottom, fetching from a smaller number of regional parent nodes, which in turn fetch from Akamai's origin shield layer before finally reaching your origin. This means your origin sees traffic only from a handful of Akamai regional nodes, not from thousands of edges — a significant reduction in origin-facing load. The trade-off vs Anycast is that GeoDNS failover is slower (DNS TTL-bound, typically 30s to several minutes) compared to BGP failover which can reroute in seconds.

Historically, Akamai's pricing has leaned toward per-request billing rather than pure bandwidth billing — a model better suited to enterprises with many small, high-value requests (financial data, media manifests) than consumer streaming sites with massive bulk video transfers.

AWS CloudFront + S3 + Origin Shield

CloudFront's defining feature is its deep integration with the AWS ecosystem. An S3 bucket can be set as a CloudFront origin in minutes; CloudFront handles SSL termination, HTTPS enforcement, and signed URLs to control access to private S3 objects. The tight integration means S3 requests from CloudFront use AWS's internal network rather than the public internet — lower latency to the origin, more predictable bandwidth.

Origin Shield, launched in October 2020, adds a centralized caching layer between CloudFront's edge nodes and your origin. Without Origin Shield, each of CloudFront's hundreds of edge locations (publicly stated as 750+ PoPs across 100+ cities and 50+ countries) can independently make a cache-fill request to your origin on a miss. With Origin Shield, all misses from all edge locations first funnel through a single regional Origin Shield node, which consolidates them into a much smaller number of origin requests. A 100-node cache miss storm against your origin becomes a single request. This is particularly valuable during a thundering herd scenario or a cache warm-up after a fresh deploy.

Lambda@Edge and CloudFront Functions provide compute at the edge — the former for heavier workloads (can run full Node.js or Python, sub-second latency), the latter for ultra-lightweight header manipulation and request rewriting (sub-millisecond, JavaScript only). The pricing model is per-GB data transfer out plus per 10,000 HTTPS requests — important to model before choosing CloudFront for very high request-count workloads.

Fastly — Instant Purge and the Power-User CDN

Fastly made a deliberate engineering bet: be the CDN for teams that need programmatic control over caching behavior. Most CDNs offer purge with propagation times measured in minutes. Fastly engineered for sub-second global purge as a first-class feature — the mechanism that made this possible is surrogate keys (also called cache tags). When your origin returns a response, it includes a Surrogate-Key header listing one or more logical tags: Surrogate-Key: product-42 category-shoes sale-summer. When you publish a new product page, you call Fastly's API with purge_key=product-42 and every cached object tagged with that key is instantly invalidated globally — regardless of how many URLs that affects. This makes Fastly excellent for content-heavy sites that update frequently.

Fastly's configuration language is VCL (Varnish Configuration Language), which gives engineers deep control over cache logic — custom cache key construction, request routing, header manipulation, A/B testing at the edge. It's significantly more powerful than most CDNs' point-and-click rule interfaces, but requires actual programming skill. Fastly's publicly known enterprise customers include GitHub, which uses Fastly to serve asset files and documentation, taking advantage of instant purge to reflect content updates within seconds rather than waiting for TTL expiry.

Five CDNs, five distinct architectural bets: Netflix moves hardware to ISPs and pre-positions content nightly; Cloudflare uses Anycast for automatic DDoS absorption and a uniform software stack across 300+ cities; Akamai's hierarchical GeoDNS topology reduces origin load at the cost of slower failover; CloudFront's Origin Shield collapses distributed miss traffic into a single origin request; Fastly's surrogate-key instant purge enables content-driven cache invalidation at scale. The right CDN depends on your traffic shape, origin sensitivity, and operational team's comfort with configuration complexity.

Section 21

Common Misconceptions — Mental-Model Corrections

These are the beliefs that feel correct but aren't — the ones that cause real production mistakes. Each one is a mental model that seemed reasonable until someone tested it in production. Read each correction before it costs you a P1 incident.

The misconception: CDNs cache static files. Dynamic content — personalized pages, API responses, search results — can't be cached, so CDNs don't help with them.

The reality: CDNs can and do cache dynamic content, with the right TTL and cache-key configuration. A product listing page that's the same for all users can be cached for 60 seconds with s-maxage=60, stale-while-revalidate=30 — that's still 98% of traffic served from the edge if the page gets more than one request per minute. API responses (JSON from a public endpoint like a currency exchange rate or sports score) can be cached for 5-30 seconds, dramatically reducing origin load and latency. The CDN also provides network acceleration for genuinely uncacheable content — the TCP/TLS connection is terminated at the nearest edge node, and the origin connection reuses a persistent warm connection, cutting connection establishment time even for cache misses. And edge compute (Cloudflare Workers, Lambda@Edge) can execute actual application logic at the CDN layer, eliminating the origin round trip entirely for many request types.

The misconception: When I call the CDN's purge API, the old content disappears immediately from all edge nodes.

The reality: Purge propagation time varies widely by CDN and plan tier. Fastly's surrogate-key purge is genuinely sub-second because of its purpose-built architecture. Cloudflare cache purge typically takes 30 seconds or less for cache purge via the dashboard or API, but can be longer during high-traffic periods. AWS CloudFront invalidations are notorious for taking 1 to 15 minutes to propagate. Akamai purge on legacy configurations could take even longer. The implication: if your system requires content changes to be reflected within seconds globally, you cannot rely on purge alone — you need content-hashed URLs (for immutable assets), surrogate keys with a CDN that supports fast propagation, or an architecture that uses short TTLs (10-30 seconds) as the primary freshness mechanism and treats purge as a best-effort acceleration tool, not a hard guarantee.

The misconception: Cloudflare offers free DDoS protection, so any site on Cloudflare's free tier is protected against all DDoS attacks.

The reality: Cloudflare's free tier does provide genuine L3/L4 (network-layer) DDoS protection — volumetric floods of UDP/TCP packets are absorbed by Cloudflare's global network. However, sophisticated Layer 7 (application-layer) attacks — HTTP floods that look like legitimate browser traffic — require the paid WAF tier to configure custom rate limiting rules, bot management, and challenge pages. A resourceful attacker sending 50,000 "legitimate-looking" GET requests per second to your login page will flow through the free tier's CDN to your origin. The free tier also has rate limits on Cloudflare Workers usage, and some advanced security features (custom firewall rules, business logic protection) are enterprise-only. The free tier is genuinely useful for small sites — but mission-critical applications need at least the Pro tier for meaningful L7 protection.

The misconception: CDNs route users to the nearest edge node using Anycast — all major CDNs do this the same way.

The reality: Anycast routing (same IP announced from many locations; BGP routes to the nearest one) is used by Cloudflare, Fastly, and a number of others. But Akamai historically used GeoDNS as its primary routing mechanism — the DNS server looks at the requesting resolver's IP address and returns the IP of the nearest Akamai edge for that geography. These two approaches have different failure characteristics: Anycast failover happens in seconds when BGP re-routes; GeoDNS failover is bounded by DNS TTL (typically 30 seconds to several minutes). They also behave differently under split-tunnel VPNs — GeoDNS routes based on the VPN exit IP, which may be far from the actual user, leading to sub-optimal edge selection. Knowing which mechanism a CDN uses matters when you're designing for high availability or serving globally distributed users with VPNs.

The misconception: Running code at the CDN edge (Cloudflare Workers, Lambda@Edge, Fastly Compute) is always faster than running it at the origin, because the edge is physically closer to the user.

The reality: Edge compute is faster for latency-bound workloads that don't need the database — request rewriting, A/B testing flag evaluation, header manipulation, simple auth token validation, serving static personalization from KV stores. But for workloads that require a database query, edge compute adds a second network hop: user → edge → database. If the database is in us-east-1 and the user is in Sydney, the edge node in Sydney still needs to cross the Pacific to reach the database. You've saved the Sydney→Sydney TCP handshake but added an edge→origin network call for the data. Cold start latency is another factor: V8 isolates (Cloudflare Workers) have sub-millisecond cold starts, but Lambda@Edge Node.js runtimes can take 100-500 ms on a cold start. For heavy workloads (database-intensive, complex computation, large memory footprints), the origin running on a dedicated server in a well-provisioned data center is often faster end-to-end than edge compute that must cross the globe to fetch data anyway.

The misconception: HTTP/3 is a transport-layer swap — same HTTP semantics, just QUIC instead of TCP. The practical difference for users is negligible.

The reality: The user experience differences are real, particularly on mobile or high-latency connections. QUIC has two key advantages over TCP + TLS. First, 0-RTT connection establishment: for returning visitors, QUIC can resume a connection and send data in the very first packet (0 round trips), while TLS 1.3 over TCP requires at minimum 1 round trip before data can flow. Second, no head-of-line blocking at the transport layer: HTTP/2 over TCP multiplexes multiple streams over one TCP connection, but a single dropped packet blocks all streams until retransmission. QUIC implements multiplexing in the transport layer so a lost packet only delays its own stream, not others. These differences are measurable on mobile networks with 2-5% packet loss rates. The upgrade is largely invisible in configuration — Cloudflare enables HTTP/3 by default — but understanding why it's better helps you decide whether it's worth the infrastructure cost of deploying QUIC on your own origin or whether you're happy letting the CDN handle the QUIC termination.

The misconception: Setting max-age=300 (or s-maxage=300) means the CDN will serve your content from cache for exactly 5 minutes, then fetch fresh content.

The reality: max-age sets the maximum freshness lifetime — the upper bound, not the guaranteed duration. A CDN edge node is free to evict a cached object before its TTL expires if the edge node is under memory pressure, if an LRU (Least Recently Used) eviction policy decides the object hasn't been accessed recently enough, or if the CDN operator's infrastructure has a lower effective TTL limit. A popular asset on a busy edge node will stay cached for its full TTL. An obscure asset on a lightly-loaded edge might be evicted after 30 seconds and treated as a cache miss on the next request. This matters for cache warming strategies: don't assume a TTL means "this object is definitely cached." Monitor actual cache hit ratios per content path, not just your TTL settings.

Seven mental-model corrections: CDNs accelerate dynamic content too (via short TTLs and edge compute); purge propagation is seconds to minutes, not instant; Cloudflare free tier protects L3/L4 but not sophisticated L7 attacks; Akamai uses GeoDNS, not Anycast; edge compute can be slower when a database round-trip is required; HTTP/3 provides real improvements on lossy mobile connections; and max-age is a freshness ceiling, not a guarantee.

Section 22

Operational Playbook — Pick, Onboard, Monitor, and Optimize a CDN

Picking a CDN is the easy part. The hard part is onboarding it correctly, configuring it so it actually improves performance rather than adding a mysterious failure point, and tuning it over time as your traffic patterns evolve. This playbook walks through five stages of a production CDN deployment.

The five stages are a loop, not a one-time checklist. Traffic patterns change, costs evolve, and what worked at 10,000 daily users may need rethinking at 10 million. Let's walk through each stage.

Stage 1 — Pick the Right CDN for Your Geography and Budget

The first question is geography: where are your users? CDN performance depends entirely on whether the CDN has PoPs close to your actual user base. A CDN with excellent US and European coverage but no presence in Southeast Asia is the wrong choice for an Indonesian startup. Before comparing CDN features, plot your user geography from your analytics tool, then check each CDN candidate's PoP map and compare coverage in your top-5 user regions.

The second question is cost model. CDN pricing has three broad patterns:

Free tier + bandwidth billing (Cloudflare): Cloudflare's free tier covers unlimited bandwidth with no per-GB charges — costs come from add-ons (Argo smart routing, Workers beyond free quota, WAF). Excellent for hobby projects, MVPs, and even large sites whose main cost is bandwidth rather than compute.
Pay-per-GB (AWS CloudFront, GCP Cloud CDN): Billed per GB of data transferred out, plus per 10,000 HTTPS requests. Cost scales with usage — good for unpredictable or low-volume workloads; can become expensive at very high volume. Run cost projections before committing.
Committed bandwidth / flat-rate (Fastly, enterprise Akamai): Monthly commit with per-Gbps pricing above the commit. Better unit economics at high volume; requires forecasting. Fastly's instant-purge and VCL configuration make it worth the premium for large content-driven sites.

Stage 2 — Onboard: DNS, SSL, and Origin Rules

Onboarding is a DNS change. You update your domain's DNS record from pointing at your origin IP to pointing at the CDN's CNAME (or Anycast IP for Cloudflare). Browsers and resolvers then route to the CDN instead of directly to your server. A few things to set up carefully during onboarding:

SSL / TLS certificates: Most CDNs provide Universal SSL (free Let's Encrypt-backed certificates) automatically on plan activation. Verify HTTPS is working and HSTS headers are set before the launch. Also configure end-to-end encryption between the CDN and your origin (Full SSL in Cloudflare parlance) — don't accept CDN-to-origin plain HTTP in production.
Origin rules / page rules: Configure which paths to cache and for how long. Static assets under /assets/ → s-maxage=31536000, immutable. API endpoints → private, no-store by default (opt-in specific read-only endpoints for caching). HTML pages → per-page judgement.
Origin host header: Ensure the CDN forwards the correct Host header to your origin. If your origin expects api.yoursite.com but the CDN forwards yourorigin.internal, virtual-host-based routing breaks.
Cache-Control headers on your origin: Audit your origin's response headers now. If you're not currently emitting explicit Cache-Control headers, the CDN will use its default behavior (which varies by CDN and is rarely what you want).

Stage 3 — Test: Verify Cache Behavior Before Traffic Hits

Before routing production traffic through the CDN, verify that caching is actually working the way you configured it. Use curl -I to inspect response headers and look for CDN-specific cache status headers:

# Cloudflare adds CF-Cache-Status: HIT / MISS / EXPIRED / BYPASS curl -sI https://your-domain.com/assets/main.js | grep -i 'cf-cache\|cache-control\|age' # CloudFront adds X-Cache: Hit from cloudfront / Miss from cloudfront curl -sI https://your-domain.com/assets/main.js | grep -i 'x-cache\|age\|cache-control' # Make the request twice — first should be MISS, second should be HIT # If second is still MISS, your Cache-Control headers are not cacheable

Each curl -I sends a HEAD request (headers only, no body) and the grep filters output down to the cache-related lines. The CDN-specific header — CF-Cache-Status on Cloudflare, X-Cache on CloudFront — is the source of truth: it tells you whether the request was served from cache or had to go to origin. The two-shot test (fire the same URL twice) is the most important diagnostic in this section: if the second request doesn't flip from MISS to HIT, the origin isn't sending cacheable Cache-Control headers, and no amount of CDN-side configuration will save you.

Also test from multiple geographic regions using tools like WebPageTest (choose test locations in different continents) or the CDN's own latency diagnostic tools. Measure Time to First Byte (TTFB) from at least 3 continents — this is the clearest signal of whether geographic edge distribution is working. A CDN-served TTFB from a distant location should be 30-80 ms; a non-CDN request from the same location may be 200-600 ms.

Stage 4 — Monitor: Dashboards, Alerts, and the Metrics That Matter

Once live, watch these metrics continuously. A sudden drop in any of them is a leading indicator of a configuration problem, a pricing surprise, or a content issue:

Cache hit ratio (by path group): Separate your static assets from your HTML pages from your API endpoints. A healthy site has 95%+ hit ratio on static assets. HTML might be 60-80%. API endpoints should be 0% unless you've explicitly configured caching. A sudden drop in static asset hit ratio often means a cache-busting query parameter leaked into production.
Origin egress (bytes served directly by your origin): Track this in absolute terms. If CDN hit ratio stays the same but your origin is serving more bytes, traffic has grown — great! If origin egress spikes without a traffic growth explanation, something has broken the cache.
p50 / p95 / p99 TTFB by region: Edge latency should be consistent and low. A p99 spike in a specific region often indicates a CDN PoP is having issues and routing is falling back to a more distant node, or that your origin is slow for requests that miss the cache.
Origin error rate (4xx/5xx): Alert on this crossing 1%. High origin error rates combined with caching can amplify errors — see Bug 3 in S19.

Configure alerts: hit ratio drops below 80% for static assets → PagerDuty. Origin error rate exceeds 2% → PagerDuty. Origin egress doubles in 5 minutes (possible cache purge incident or traffic spike) → alert.

Stage 5 — Optimize: Squeeze the Last 10% of Performance

Once you have baselines, look for specific optimization opportunities:

Fix cache key leakage: Pull a report of your top cache MISS URLs. Anything that should be a HIT but isn't is usually a rogue query parameter. Add a CDN rule to strip tracking parameters (utm_source, utm_medium, gclid, fbclid) from cache keys before storage. Most CDNs call this "Cache Key normalization" or "Ignore Query String."
Add s-maxage separately from max-age: Use s-maxage for shared cache (CDN) TTL and max-age for browser cache TTL. A product page might have max-age=60, s-maxage=600 — browsers get a fresh copy every minute, but the CDN can serve it for 10 minutes and absorb the traffic burst.
Hash filenames for immutable assets: If you're still using versioned query strings (main.js?v=123), migrate to hash-in-filename (main.8f3a2b.js). Query-string versioning still creates cache misses because some CDNs by default treat ?v=123 and ?v=124 as separate URLs. Filename hashing is more universally respected and allows truly immutable caching (Cache-Control: max-age=31536000, immutable).
Enable Brotli compression: Brotli compresses text-based assets 15-25% smaller than gzip. Most CDNs can compress at the edge rather than requiring your origin to do it. Smaller responses = faster transfers = lower bandwidth cost.
Enable HTTP/3: One toggle in most CDN dashboards. No code changes required. Users on modern browsers (Chrome, Safari, Firefox) automatically negotiate HTTP/3. Measurable improvement on mobile networks.

The operational CDN lifecycle has five stages: choose based on geographic PoP coverage and cost model; onboard by updating DNS, configuring SSL, and setting up origin rules; test cache behavior explicitly before routing production traffic; monitor hit ratio, origin egress, regional TTFB, and error rates with real alerts; optimize by fixing cache key leakage, separating s-maxage from max-age, adopting hashed filenames, and enabling Brotli and HTTP/3.

Section 23

Cheat Sheet & Glossary — The 30-Second Recap

A quick-reference grid of the core CDN concepts, followed by a glossary of every term you'll encounter in CDN documentation, job descriptions, and incident reports.

Quick-Reference Cheat Sheet

A Point of Presence is a physical data center running CDN software — the closest location to a user where the CDN can serve cached content or terminate connections.

A single IP address announced from hundreds of locations simultaneously via BGP. The internet's routing infrastructure automatically directs each user to the nearest location. Used by Cloudflare, Fastly.

The DNS server returns a different IP based on where the resolver is located. Used by Akamai and historically by CloudFront. Slower to fail over than Anycast (DNS TTL-bound).

The HTTP response header that tells browsers and CDN edges how long to cache a response. The most important CDN configuration lever you have. Missing or wrong Cache-Control is the root cause of 80% of CDN problems.

The s-maxage directive in Cache-Control applies only to shared caches (CDN edges, proxies) — not browsers. Use it to set a longer CDN TTL while keeping a shorter browser TTL with max-age.

The immutable directive tells the browser (and CDN) that this response will never change for the given URL. Use only with hashed filenames. Enables aggressive long-term caching with no revalidation overhead.

A tag (header value) attached to cached responses at origin. Calling the CDN's purge API with that tag instantly invalidates every object tagged with it — regardless of URL. Enables content-driven invalidation without wildcard purges.

Edge Side Includes: a templating language processed at the CDN edge that assembles a response from separately-cached fragments. Useful for pages that are 95% static with 5% dynamic — the static shell is cached; only the dynamic fragment hits the origin.

The third major version of HTTP, using QUIC (UDP-based) instead of TCP. Benefits: 0-RTT connection resume for returning visitors, no head-of-line blocking at the transport layer. Enabled with a toggle on most CDNs.

Name assets with a hash of their contents: main.8f3a2b.js. The URL changes only when the content changes. Combine with immutable for zero-revalidation caching. Deploy without CDN purge.

Glossary

PoP: Point of Presence — a CDN data center or co-location facility in a specific city where edge nodes are housed.
Anycast: A routing strategy where one IP address is advertised from many physical locations via BGP. Packets are delivered to the topologically nearest node, not a specific machine.
GeoDNS: DNS that returns different IP addresses depending on the geographic location of the querying resolver. Different from Anycast — routing happens in DNS, not in the network layer.
BGP: Border Gateway Protocol — the routing protocol that determines paths between autonomous systems on the internet. Anycast CDNs use BGP to announce their IP from many locations simultaneously.
edge compute: Running application code (JavaScript, WASM, Rust) on CDN edge nodes rather than on the origin server. Examples: Cloudflare Workers, Lambda@Edge, Fastly Compute@Edge.
ESI: Edge Side Includes — an XML-based templating language processed at the CDN edge to compose pages from separately-cached or dynamically-fetched fragments.
surrogate key: A tag added to cached responses (via Surrogate-Key or Cache-Tag header) enabling group invalidation — one API call can purge all objects sharing a tag. Pioneered by Fastly, now supported by most enterprise CDNs.
Vary header: An HTTP response header listing which request headers were used to select the response. The CDN creates separate cache shards for each unique combination of the listed header values. Dangerous with high-cardinality headers like User-Agent.
conditional GET: A GET request with an If-None-Match (ETag-based) or If-Modified-Since header. The origin returns 304 Not Modified if the resource hasn't changed, saving transfer bandwidth while confirming freshness.
byte-range request: An HTTP request for a specific byte range of a resource (Range: bytes=0-1048575). Used by video players to fetch HLS/DASH chunks; CDNs must handle these correctly and can serve them from a cached full response.
origin shield: A centralized intermediate caching layer between CDN edge nodes and the origin server. All cache misses from all edges funnel through origin shield first, collapsing a distributed miss storm into a single origin fetch.
tiered caching: A CDN topology with multiple cache layers (edge → regional parent → origin shield → origin). A miss at the edge is filled from the regional parent rather than the origin, reducing origin load and inter-region bandwidth costs.
cold start: The latency penalty incurred when an edge compute function (Cloudflare Worker, Lambda@Edge) needs to initialize a new execution context because no warm context is available. V8 isolate cold starts are sub-ms; Lambda@Edge Node.js cold starts can be 100-500 ms.
HTTP/3: The third major HTTP version, using QUIC (a UDP-based transport) instead of TCP. Key advantages: 0-RTT reconnection for returning clients, no transport-layer head-of-line blocking across multiplexed streams.
QUIC: Quick UDP Internet Connections — a transport protocol developed by Google, now standardized as RFC 9000. Implements reliable delivery, flow control, and congestion control over UDP rather than TCP. The foundation of HTTP/3.

Ten cheat-sheet cards cover the core CDN primitives: PoP, Anycast, GeoDNS, Cache-Control, s-maxage, immutable, surrogate keys, ESI, HTTP/3, and hashed filenames. The glossary defines 16 terms that appear consistently in CDN documentation, incident reports, and system design interviews — having these definitions internalized means you can read any CDN vendor's documentation without stumbling on jargon.

Content Delivery Networks — Caching the Internet, Geographically

TL;DR — CDNs in Plain English

Why You Need This — The Speed-of-Light Tax

The Physics Problem You Can't Optimize Away

The Startup Story: Sydney Users Stop Coming Back

The RTT Decomposition: Where Every Millisecond Goes

Mental Model — The Three-Layer Cache Pyramid

Layer 1 — The Origin (Bottom)

Layer 2 — The Origin Shield (Middle, Optional)

Layer 3 — Edge PoPs (Top)

Why the Shield Matters: The Thundering Herd Problem

Core Concepts — The CDN Vocabulary

Cache Mechanics

Routing Terms

Cache Control Terms

Performance Metrics

How a Request Finds the Edge — Anycast vs GeoDNS Routing

Anycast: Let BGP Do the Work

GeoDNS: Return the Right IP Based on Location

Side-by-Side Comparison

What's in a CDN Cache Key — and Why It Matters

The Default Cache Key (and Why It's Often Wrong)

The Random-Query-Parameter Disaster (with numbers)

How to Fix It: Cache Key Normalization

The Vary Header: When Cache Key Expansion Is Intentional

When to Include Headers and Cookies in the Cache Key

The HTTP Caching Protocol — Cache-Control, ETag, Vary

Cache-Control: The Master Directive

ETag and Conditional GETs — How "Not Modified" Works

Vary: Teaching the CDN About Variants

Cache Behaviors — TTL, stale-while-revalidate, stale-if-error

The TTL Problem: The Thundering Herd at Expiry

stale-while-revalidate: Serve First, Refresh in Background

stale-if-error: Your Origin Goes Down, Your Site Stays Up

CDN Invalidation — Purge, Surrogate Keys, Cache Busting

Method 1: URL Purge — Simple but Slow

Method 2: Surrogate Keys / Cache Tags — Fast and Surgical

Method 3: Cache-Busting URLs — Instant and Zero-Config

Edge Compute — Cloudflare Workers, Lambda@Edge, Fastly Compute

The Three Platforms Compared

Cloudflare Workers — V8 Isolates, Effectively Zero Cold Start

Lambda@Edge — Node.js/Python, CloudFront-Integrated

Fastly Compute — WebAssembly, Sub-millisecond Cold Start

Edge-Side Includes: Composing Cached Pages with Dynamic Fragments

CDN Pricing Models — Bandwidth, Requests, Compute

The Three Pricing Dimensions

Provider Comparison (Approximate Public List Prices)

The Hidden Cost: Double-Paying on Cache Misses

CDN Architecture — How a Request Actually Flows

The Full Request Path — Cache HIT

The Full Request Path — Cache MISS with Shield Tier

Putting It All Together: A Complete First-Visit Flow

Static Asset Patterns — Immutable URLs, Asset Hashing, Long TTLs

The Problem with Mutable URLs

Content Hashing: Baking the Fingerprint Into the Filename

The Critical Exception: HTML Has a Short TTL

Video Delivery — HLS, DASH, and Byte-Range Caching

The Core Idea: Chop the Video Into Tiny Pieces

Byte-Range Requests: Seeking Without Re-Downloading

The Scale Numbers

API Acceleration & Dynamic Content

Technique 1: Connection Reuse — Free Latency Savings

Technique 2: Short-TTL API Caching — Even 5 Seconds Matters

Technique 3: Edge-Side Includes (ESI) — Assembling Pages at the Edge

CDN Security — DDoS Mitigation, WAF, Bot Management, TLS Termination

DDoS Mitigation — Volume Absorbed by Distribution

WAF — Web Application Firewall

Bot Management

TLS Termination at the Edge — and What Happens Behind It

Common Pitfalls & Production Incidents

Practice Exercises — Build Your Intuition

Bug Studies — When CDNs Go Wrong in Production

What Went Wrong

What Went Wrong

What Went Wrong

What Went Wrong

Real-World CDN Architectures — How the Big Players Are Built

Common Misconceptions — Mental-Model Corrections

Operational Playbook — Pick, Onboard, Monitor, and Optimize a CDN

Cheat Sheet & Glossary — The 30-Second Recap