HTTP Evolution — System Guide

Section 1

TL;DR — The Delivery Truck Versions

Mental Model — Delivery Truck Versions: Think of HTTP as the delivery truck that carries web pages from a server to your browser. Version 1.0 was a truck that made one trip, delivered one package, then drove back to the garage and shut off the engine. If you ordered 50 items, the truck made 50 separate round trips. Version 1.1 kept the engine running between deliveries — but it could still only carry one package at a time. Version 2 upgraded to a truck that carries many packages per trip. Version 3 swapped the truck for a helicopter that doesn't get stuck in traffic jams.

Every time you load a website, your browser talks to a server using a set of rules called HTTPHyperText Transfer Protocol — the language browsers and servers speak. When your browser says "GET /index.html," it's speaking HTTP. Every webpage you've ever loaded was delivered via this protocol.. The first version was designed in 1991 by Tim Berners-Lee at CERN — when a "webpage" was one screen of text with a few links. It worked fine for that era. The problem is that a modern web page (like Amazon's homepage) needs 237 separate files — images, scripts, stylesheets, fonts, API calls. Fetching those files one at a time, with a brand-new connection each time, is like ordering 237 packages and having each one delivered by a different truck that drives to the warehouse and back. Each version of HTTP exists because someone measured the waste, identified the bottleneck, and engineered a fix.

Here is what each version actually changed, in plain English. HTTP/1.0 opened a brand-new TCP connectionTransmission Control Protocol connection — a reliable two-way communication channel between your browser and the server. Opening one requires a 3-step handshake (SYN → SYN-ACK → ACK) that takes one full round trip before any data can flow. for every single file — brutally wasteful. HTTP/1.1 fixed that by keeping the connection open (called "keep-alive"), but requests still had to wait in line, one after another. HTTP/2 let multiple requests fly over a single connection at the same time — a technique called multiplexingSending multiple independent streams of data over the same connection simultaneously. Like a highway with many lanes — cars don't have to wait for the car in front to arrive before leaving.. But HTTP/2 still ran on TCP, and when one packet got lost, TCP froze everything until that one packet was retransmitted. HTTP/3 replaced TCP entirely with a new transport called QUICQuick UDP Internet Connections — originally designed by Jim Roskind at Google. Runs on UDP instead of TCP, supports 0-RTT reconnection, and doesn't freeze all streams when one packet is lost. that doesn't have this freezing problem.

One-line takeaway: Each HTTP version exists because someone measured a real bottleneck and built a targeted fix. 1.0 wasted connections. 1.1 wasted request slots. 2 couldn't survive packet loss. 3 fixed the transport layer itself.

Section 2

The Scenario — Why Amazon.com Takes 237 Requests

Open Chrome DevTools on Amazon's homepage (press F12, click the Network tab, then reload). Watch the waterfall. You will see 237 separate HTTP requests fire off — one for the HTML document, dozens for CSS and JavaScript bundles, over a hundred for product images, font files, tracking pixels, and API calls. The server can generate each response in under 5 milliseconds. But the page still takes 3-4 seconds to fully load. Where is the time going?

Think First If the server can respond in 5ms, and you need 237 responses, that is only 1.2 seconds of server work total. But the page takes 3-4 seconds. What is eating the other 2-3 seconds? Think about what happens BEFORE the server can even start generating a response.

The answer is protocol overhead. Before your browser can receive a single byte of a file, it has to open a connection to the server (that is a TCP handshakeThe 3-step process that establishes a reliable connection: your browser sends SYN, the server replies SYN-ACK, your browser sends ACK. This takes one full round-trip time (RTT) — typically 20-100ms depending on distance. — one round trip), then negotiate encryption (that is a TLS handshakeTransport Layer Security handshake — the encryption negotiation between your browser and the server. In TLS 1.2, this adds 2 more round trips. In TLS 1.3, it is reduced to 1 round trip. Without this, your data travels in plaintext. — one or two more round trips), and THEN send the actual request. If every file requires its own connection, the overhead adds up fast.

Let's do the math. Say you are in London and the server is in Virginia (Amazon's us-east-1). The round-trip time is about 80ms. Opening a TCP connection costs 1 RTT (80ms). TLS 1.2 costs 2 more RTTs (160ms). Then the actual request-response is 1 RTT (80ms). That is 320ms just to fetch ONE file. Multiply by 237 files, and you would be waiting 75 seconds if the browser fetched them one at a time.

Of course, browsers don't fetch one file at a time. Chrome opens up to 6 parallel connections per domain. That means it can download 6 files simultaneously — but each of those 6 connections still has to pay the full TCP + TLS setup cost. With 237 files and 6 connections, the browser makes about 40 batches of requests. That is still too slow, and it is why every version of HTTP tried to reduce this per-request overhead.

Open DevTools right now on any website. Click the Protocol column header in the Network tab (you may need to right-click the column header and enable it). You will see h2 (HTTP/2) or h3 (HTTP/3) next to most requests. These newer protocols exist specifically because the math above was unacceptable.

The core problem: Modern websites need 50-300 separate HTTP requests. If the protocol wastes time on connection setup, serialization, or header redundancy, no amount of server optimization will fix your load times. The protocol itself is the bottleneck — and that is exactly what HTTP/2 and HTTP/3 were designed to fix.

Now let's rewind to the beginning. To understand WHY each version was built, we need to start with the original HTTP — the one Tim Berners-Lee typed into a text editor at CERN in 1991.

Section 3

The First Attempt — HTTP/1.0 and the One-Trip Truck

The year is 1991. Tim Berners-Lee is at CERNThe European Organization for Nuclear Research in Geneva, Switzerland. Berners-Lee was a software engineer there who invented the World Wide Web, HTML, HTTP, and the first web browser — all to help physicists share research documents. in Geneva, and he has a problem: physicists need to share research papers, but every department uses a different document system. His solution is a linked document system — he calls it the "World Wide Web." He needs a simple way for one computer to ask another computer for a document. So he designs a protocol so simple it fits in a few paragraphs.

Here is what the very first HTTP looked like. You could literally type it by hand using telnetA command-line tool that opens a raw TCP connection to any server and lets you type text directly. Before web browsers existed, this is how people tested HTTP — by typing the requests character by character.:

Terminal — Telnet to info.cern.ch (1991)

# This is what the FIRST web request looked like in 1991
# HTTP/0.9 — literally one line

$ telnet info.cern.ch 80
Trying 188.184.21.108...
Connected to info.cern.ch.

GET /hypertext/WWW/TheProject.html
                                        # ← That's it. One line. No headers.

<TITLE>The World Wide Web project</TITLE>
<NEXTID N="55">
<H1>World Wide Web</H1>The WorldWideWeb (W3) is a wide-area
<A NAME=0 HREF="WhatIs.html">hypermedia</A> information retrieval
initiative aiming to give universal access to a large universe of documents...

Connection closed by foreign host.      # ← Server closes connection immediately

That is HTTP/0.9 — the very first version. No headers. No status codes. No content types. Just GET /path and the server sends back HTML and hangs up. It worked beautifully when a "website" was one page of text with a few blue links.

By 1996, the web had grown. People wanted images on pages. They wanted to know if a page existed or not (404 errors). They wanted servers to tell them what kind of file they were sending (is this HTML or a JPEG?). So Berners-Lee, along with Roy Fielding and Henrik Frystyk Nielsen, formalized HTTP/1.0 in RFC 1945. This version added headers, status codes, and content types — the building blocks of the modern web.

Terminal — HTTP/1.0 request with headers

# HTTP/1.0 — now with headers!
# You can try this RIGHT NOW:

$ telnet www.example.com 80
GET / HTTP/1.0                          # ← Version declared
Host: www.example.com                   # ← Headers added
User-Agent: Mozilla/1.0                 # ← Identify yourself
Accept: text/html                       # ← Tell server what you want
                                        # ← Blank line = end of request

HTTP/1.0 200 OK                         # ← Status code!
Content-Type: text/html                 # ← "Here's what I'm sending"
Content-Length: 1256                     # ← "Here's how much"
Date: Sun, 06 Nov 1994 08:49:37 GMT    # ← Timestamp

<html><body>Example Domain...</body></html>

Connection closed by foreign host.      # ← Connection DIES after every response

The key improvement: headers let the browser and server negotiate — what format do you want? What encoding do you support? Is the content compressed? This was essential for the multimedia web. But there was a brutal cost hidden in the design.

Think First Look at that last line: "Connection closed by foreign host." That means the TCP connection is destroyed after every single response. If a page has 10 images, how many TCP connections does the browser have to open? And how much time does that waste if each connection takes 80ms to set up?

Every single request paid the full price of opening a TCP connection (1 RTTRound-Trip Time — the time for a packet to travel from your browser to the server and back. From London to Virginia, this is roughly 80ms. From Mumbai to California, roughly 150ms. Pure physics — nothing can beat the speed of light in fiber.), doing the actual request-response (1 RTT), and then tearing the connection down. For a page with 10 images, that is 10 TCP handshakes — 800ms of pure overhead before a single byte of image data is delivered. In 1996, most pages had fewer than 10 resources, so this was tolerable. But the web was about to get much more complex.

Year	Average Resources per Page	TCP Handshake Waste (80ms RTT)	Problem Severity
1996	~5 files	400ms	Tolerable — dial-up was already slow
2000	~25 files	2,000ms	Noticeable — users complain
2010	~80 files	6,400ms	Painful — developers hack around it
2024	~70-300 files	5,600-24,000ms	Impossible without HTTP/2+

Why was it designed this way? In 1991, Berners-Lee was solving a document retrieval problem — "give me this paper." One request, one document, done. Nobody imagined a page would need 237 separate files. The connection-per-request model was perfectly rational for its era. It just did not scale to the multimedia web.

Section 4

Where It Breaks — Three Problems That Forced a Redesign

HTTP/1.1 fixed the worst sin of 1.0 — it kept connections alive so you didn't pay a fresh TCP handshake for every file. That was a huge win. But as websites grew from 10 resources to 80+, three new bottlenecks emerged. Each one is measurable, each one is mathematical, and each one drove the design of HTTP/2.

Problem 1 — Connection Overhead Still Adds Up

Even with keep-alive, browsers open multiple TCP connections to speed things up. Each connection still pays the full TCP + TLS setup cost. A typical page in 2012 needed about 80 resources. Let's measure the real cost with curlA command-line tool for transferring data with URLs. The -w flag lets you print timing metrics like connection time, TLS handshake time, and total transfer time — invaluable for diagnosing HTTP performance.:

Terminal — Measuring connection overhead

# Measure the REAL cost of a single HTTPS connection
$ curl -w "time_connect: %{time_connect}\ntime_appconnect: %{time_appconnect}\ntime_starttransfer: %{time_starttransfer}\ntime_total: %{time_total}\n" -o /dev/null -s https://www.amazon.com

time_connect:       0.024       # TCP handshake: 24ms
time_appconnect:    0.068       # TLS handshake done: 68ms (44ms for TLS alone)
time_starttransfer: 0.153       # First byte arrived: 153ms
time_total:         0.412       # Everything downloaded: 412ms

# That's 68ms of pure overhead BEFORE any content flows.
# Now multiply: 80 resources ÷ 6 connections = ~14 connection setups
# 14 × 68ms = 952ms wasted on handshakes alone

Run that command yourself — replace the URL with any site you like. The time_connect is your TCP handshake, time_appconnect is TCP + TLS combined. The gap between time_appconnect and time_starttransfer is how long the server took to generate the response. You will almost always find that the handshake overhead dwarfs the server processing time.

Think First If each connection costs 68ms to set up, and you need 80 files, why doesn't the browser just open 80 connections at once? What would happen to the server if every visitor did that?

Problem 2 — The 6-Connection Limit per Domain

Browsers deliberately limit themselves to 6 simultaneous TCP connections per domain. Why? Because if Chrome opened 80 connections to Amazon, and 10 million users did the same thing at the same time, Amazon's servers would have to juggle 800 million open connections. The server would collapse. So browsers cooperate — they open at most 6 connections and queue the remaining requests.

The math is simple but painful. If you have 80 resources and 6 connections, the browser fetches files in batches of 6. That is 80 ÷ 6 = 14 rounds (rounding up). Each round has to wait for the slowest file in that batch to finish before the next batch can use that connection slot. If one file in a batch takes 200ms (maybe a large image), all other connections in that batch sit idle waiting for the next round.

Developers got creative. Since the 6-connection limit is per domain, they spread their files across multiple subdomains — img1.cdn.com, img2.cdn.com, img3.cdn.com. This hack was called domain shardingA performance hack where you serve resources from multiple subdomains so the browser opens 6 connections to EACH subdomain, multiplying your parallelism. Common from 2008-2015 but actually harmful with HTTP/2, which multiplexes everything on one connection., and it worked — but it was a bandaid on a protocol-level limitation. Each extra domain added a DNS lookup (~20-50ms) and a new TLS handshake. HTTP/2 would make this hack obsolete (and actually harmful — domain sharding with HTTP/2 is worse because it defeats multiplexing).

Problem 3 — Head-of-Line Blocking in Pipelining

HTTP/1.1 had a feature called pipeliningAn HTTP/1.1 feature that lets the browser send multiple requests on one connection without waiting for each response. Sounds great, except the server MUST return responses in the exact same order the requests were sent — and that causes head-of-line blocking. that was supposed to solve the serialization problem. The idea was clever: instead of waiting for a response before sending the next request, the browser could fire off 5 requests in a row on the same connection. The server would process them and send responses back. Sounds perfect, right?

Here's the catch: the HTTP/1.1 spec requires responses to come back in the same order as the requests. If you request style.css, app.js, and hero.jpg in that order, the server MUST send the CSS response first, then the JS, then the image — even if the image was ready first. If the CSS file takes 500ms to generate (maybe it's dynamically compiled), the JS and image sit on the server fully ready but unable to be sent. This is called head-of-line blocking — the first item in the queue blocks everything behind it.

This problem was so bad that no major browser ever enabled pipelining by default. Firefox had it as a hidden flag (network.http.pipelining). Chrome never shipped it. The feature existed in the spec but was dead on arrival because head-of-line blocking made it unreliable. The only real fix was to rethink the entire protocol — which is exactly what happened next.

Summary of the three bottlenecks: (1) Connection overhead — 80 resources times handshake cost equals wasted seconds. (2) 6-connection limit — 80 ÷ 6 = 14 rounds of queuing. (3) Head-of-line blocking — pipelining was unusable because responses had to stay in order. All three problems share a root cause: HTTP/1.x treats one connection as one serial pipe. The fix? Let one connection carry many independent streams at the same time.

Section 5

The Breakthrough — Multiplexing Changes Everything

In 2009, Mike Belshe and Roberto Peon at Google were staring at the same numbers we just calculated. Google's business literally depends on page load speed — they had measured that an extra 500ms of latency costs them 20% of traffic. So Belshe's team built an experimental protocol called SPDYPronounced "speedy." An experimental protocol developed by Google in 2009 that introduced multiplexing, header compression, and server push. It was so successful that it became the foundation for HTTP/2, which was standardized as RFC 7540 in 2015. (pronounced "speedy") that attacked all three bottlenecks at once.

The core idea is beautifully simple. Instead of treating a TCP connection as a single pipe that carries one request at a time, SPDY turned it into a highway with numbered lanes. Each request gets a unique stream IDA number that tags every frame in an HTTP/2 connection. Stream 1 might carry style.css, stream 3 carries app.js, stream 5 carries hero.jpg — all flowing over the same TCP connection simultaneously. Odd numbers are client-initiated, even numbers are server-initiated., and the server can send responses for different streams in any order, interleaved freely. This is called multiplexing — many independent conversations on one connection.

Think First If multiplexing means all requests go on one connection, and we no longer need 6 parallel connections, what happens to domain sharding? Would you still spread files across subdomains with HTTP/2?

SPDY was so successful that the IETFThe Internet Engineering Task Force — the body that standardizes internet protocols. When they publish an RFC (Request for Comments), it becomes the official specification that every browser and server follows. adopted it as the starting point for HTTP/2. In 2015, HTTP/2 became an official standard (RFC 7540). You can see it in action right now:

Terminal — Verifying HTTP/2 with curl

# See HTTP/2 in action — the -v flag shows protocol negotiation
$ curl --http2 -v -o /dev/null -s https://www.google.com 2>&1 | head -20

* Connected to www.google.com (142.250.80.4) port 443
* ALPN: server accepted h2                    # ← Server says "yes, I speak HTTP/2"
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://www.google.com/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [:scheme: https]
> GET / HTTP/2                                 # ← Request sent on stream 1
< HTTP/2 200                                   # ← Response on stream 1
< content-type: text/html; charset=UTF-8

# The key line is "ALPN: server accepted h2"
# ALPN = Application-Layer Protocol Negotiation
# It happens DURING the TLS handshake — zero extra round trips

Notice the line ALPN: server accepted h2. That is the browser and server agreeing to use HTTP/2 during the TLS handshake itself — no extra round trip needed. The protocol negotiation is piggybacked on encryption setup, which is clever engineering. From this point on, every request and response is a numbered stream, and the days of head-of-line blocking at the HTTP layer are over.

Impact: Google measured a 15-50% reduction in page load time after deploying SPDY across their services. Facebook, Twitter, and Akamai saw similar improvements. Today, over 60% of all web traffic uses HTTP/2 — it is the default protocol for nearly every modern website. If you run the curl command above against any major website, you will almost certainly see h2.

Section 6

How It Works — Version by Version

Now that you understand why each version was created, let's look at how each one works under the hood. Each card below covers one HTTP version — what it added, what it fixed, and a real command you can run to see it in action.

HTTP/1.1 — Keep-Alive, Chunked Transfer, Host Header

HTTP/1.1 (RFC 2068, later refined in RFC 7230-7235) was the workhorse of the web for nearly 20 years. Three features made it a massive improvement over 1.0:

1. Persistent connections (keep-alive). In 1.0, every response ended with the server slamming the connection shut. In 1.1, the connection stays open by default. The browser sends multiple requests over the same TCP pipe — no more paying the handshake cost for every file. This single change reduced page load times by roughly 40% for typical sites.

2. Chunked transfer encoding. Before 1.1, the server had to know the exact size of the response before sending it (the Content-Length header). That meant generating the entire page in memory, measuring its size, and then sending it. Chunked encoding lets the server send data in pieces as it generates them — essential for dynamically generated pages and streaming responses.

3. The Host header. This sounds trivial but it changed the economics of the web. Before 1.1, every website needed its own IP addressA numerical label (like 93.184.216.34) that identifies a server on the internet. IPv4 addresses are limited to about 4.3 billion — not nearly enough for every website to have its own. because the server had no way to know which website the browser wanted. The Host header tells the server "I want example.com" — so one server with one IP address can host thousands of different websites. This is called virtual hostingRunning multiple websites on a single server with a single IP address. The server reads the Host header to decide which website to serve. Without this, the internet would have run out of IP addresses decades ago., and without it the internet would have run out of IP addresses in the 2000s.

Terminal — HTTP/1.1 keep-alive in action

# Force HTTP/1.1 and watch the connection stay open
$ curl -v --http1.1 -o /dev/null -s https://www.example.com 2>&1 | grep -i "connection\|HTTP/"

> GET / HTTP/1.1                    # ← Using HTTP/1.1
> Host: www.example.com             # ← Host header (virtual hosting!)
> Connection: keep-alive            # ← "Don't close after responding"
< HTTP/1.1 200 OK
< Connection: keep-alive            # ← Server agrees to keep it open
< Transfer-Encoding: chunked        # ← Chunked! Size unknown upfront

# The connection stays open for more requests
# No new TCP handshake needed for the next file

Real impact: Keep-alive alone cut average page load times by about 40%. But requests were still serial on each connection — the browser had to wait for one response before sending the next request on that same pipe. This is why browsers open 6 connections: to get parallelism through brute force.

HTTP/2 — Binary Framing, Streams, HPACK, Server Push

Think First HTTP/1.1 sends headers as plain text. A typical request has headers like User-Agent (120 bytes), Cookie (250 bytes), Accept (80 bytes) — about 800 bytes total. For a page with 80 requests, that is 64,000 bytes of headers. Most of these headers are identical across all 80 requests. How much bandwidth could you save if you only sent each unique header value once and used a short index number for repeats?

HTTP/2 (RFC 7540, 2015) was a fundamental redesign of how data moves on the wire. While HTTP/1.1 sends plain text that a human can read with telnet, HTTP/2 sends binary frames — compact, machine-optimized packets. You can't telnet into an HTTP/2 server and type requests by hand anymore. That's the tradeoff: human readability for machine performance.

The four big features:

1. Binary framing layer. Every piece of data is wrapped in a small, fixed-format frame with a type, length, stream ID, and flags. Frames from different streams can be interleaved freely — that is how multiplexing works. The server sends a chunk of CSS, then a chunk of JS, then more CSS, all tagged with stream IDs so the browser reassembles them correctly.

2. Streams and multiplexing. A "stream" is a logical channel within the connection. Each request-response pair gets its own stream. Streams are independent — if one stream stalls, the others keep flowing (at the HTTP layer; TCP-level blocking is still possible, which is why HTTP/3 exists). Streams can also have priorities, so the browser can tell the server "send CSS before images."

3. HPACK header compression. HTTP/1.1 headers are verbose and repetitive. Every request sends the same User-Agent, Accept, Cookie, and Host headers — often 800+ bytes, repeated hundreds of times. HPACK fixes this with a clever trick: both the browser and server maintain a shared dynamic tableA list of recently-seen header name-value pairs, maintained independently by both client and server. When a header is sent for the first time, it gets added to the table with an index number. Next time, just the index is sent — saving 95% or more of the bytes.. The first time a header is sent, HPACK stores it with an index number. On subsequent requests, only the index is sent — a single byte instead of hundreds.

Example: the header cookie: session=abc123def456... might be 250 bytes. On the first request, HPACK sends all 250 bytes and assigns it index 62. On the next 79 requests, it sends just the number 62 — one byte instead of 250. For a page with 80 requests carrying the same cookie, that is 250 × 79 = 19,750 bytes saved just for one header. Across all headers, HPACK typically saves 85-95% of header bytes.

4. Server push. The server can send resources the browser hasn't asked for yet. When you request index.html, the server knows you'll need style.css next, so it pushes the CSS without waiting for the browser to discover and request it. Smart idea in theory — but in practice, servers often pushed resources the browser already had cached, wasting bandwidth. Chrome removed server push support in 2022. We will cover this in detail in Section 7.

Terminal — HTTP/2 verbose output

# See HTTP/2 binary framing in action
$ curl --http2 -v -o /dev/null -s https://www.cloudflare.com 2>&1

* ALPN: server accepted h2            # ← Protocol negotiated during TLS
* using HTTP/2
* [HTTP/2] [1] OPENED stream for /    # ← Stream ID 1
* [HTTP/2] [1] [:method: GET]         # ← Pseudo-headers (HTTP/2 uses : prefix)
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: www.cloudflare.com]
> GET / HTTP/2
< HTTP/2 200
< content-encoding: br                # ← Brotli compression (common with H2)
< cf-ray: 8a3f2...                    # ← Cloudflare edge server

# Notice: no "Connection: keep-alive" — HTTP/2 connections are always persistent
# Notice: pseudo-headers use : prefix instead of "Host:" header

Bottom line: HTTP/2 fundamentally changed web performance. One connection handles everything (multiplexing), headers shrink by 85-95% (HPACK), and the server can proactively push resources. Most websites today run on HTTP/2 — check any site in DevTools and you will see h2 in the Protocol column.

HTTP/3 — QUIC Over UDP, 0-RTT, Connection Migration

HTTP/2 fixed the HTTP-layer problems brilliantly. But it still runs on TCPTransmission Control Protocol — a reliable, ordered delivery protocol that has powered the internet since 1974. TCP guarantees every byte arrives and in order, but this guarantee comes at a cost: if one packet is lost, TCP freezes the ENTIRE connection until that packet is retransmitted., and TCP has a fundamental problem that cannot be fixed without replacing it: TCP-level head-of-line blocking.

Here's the issue. TCP guarantees that bytes arrive in order. If the server sends packets for streams 1, 3, and 5, and the packet for stream 3 gets lost on the network, TCP does not know that streams 1 and 5 are independent. TCP sees one stream of bytes, and it freezes everything until the lost packet for stream 3 is retransmitted and received. Streams 1 and 5 are perfectly fine, but TCP blocks them anyway. This is the exact same head-of-line problem from HTTP/1.1, just pushed down one layer.

The fix was radical: stop using TCP. Jim Roskind at Google designed a new transport protocol called QUIC (originally "Quick UDP Internet Connections") that runs on UDPUser Datagram Protocol — a simple protocol that sends packets without guarantees. Unlike TCP, UDP doesn't establish connections, doesn't guarantee order, and doesn't retransmit lost packets. QUIC builds its own reliability and ordering ON TOP of UDP, but only per-stream — so one lost packet doesn't block other streams. instead of TCP. QUIC builds its own reliability and stream management, but with a crucial difference: each stream is independent at the transport level. If a packet for stream 3 is lost, only stream 3 stalls. Streams 1 and 5 keep flowing.

Two more killer features of HTTP/3:

0-RTT connection resumption. When you revisit a site, QUIC remembers the server's encryption keys from last time. It can send data on the very first packet — zero round trips of setup. TCP + TLS 1.3 needs at minimum 1 RTT before data flows. On a 150ms RTT connection (like Mumbai to California), that is 150ms saved on every reconnection.

Connection migration. Here is a scenario that happens a million times a day: you are scrolling Twitter on your phone, you walk from your living room (WiFi) to outside (cellular). Your phone switches networks, your IP address changes, and every TCP connection dies — because TCP connections are identified by the pair of IP addresses. QUIC connections are identified by a Connection IDA random identifier that both client and server use to recognize an ongoing QUIC connection. Since it's not tied to IP addresses, the connection survives network changes — your phone can switch from WiFi to cellular without dropping a single stream. that survives network changes. Your phone switches from WiFi to cellular, the QUIC connection continues seamlessly, and you keep scrolling without a reload.

Terminal — HTTP/3 in action

# See HTTP/3 (QUIC) in action — requires curl 7.66+ with HTTP/3 support
$ curl --http3 -v -o /dev/null -s https://cloudflare.com 2>&1

* Connecting to cloudflare.com (104.16.132.229)
* QUIC cipher: TLS_AES_256_GCM_SHA384      # ← Encryption built into QUIC
* using HTTP/3                                # ← Running over UDP, not TCP
* h3 [:method: GET]
* h3 [:path: /]
* h3 [:scheme: https]
> GET / HTTP/3
< HTTP/3 301                                  # ← HTTP/3 response!
< alt-svc: h3=":443"; ma=86400               # ← "I support HTTP/3 for 24 hours"

# Notice: no TCP handshake at all
# QUIC combines crypto + transport in a single handshake
# "alt-svc" header tells browsers "you can upgrade to h3 next time"

Current adoption: As of 2024, about 30% of web traffic uses HTTP/3. Google, Cloudflare, Facebook, and Apple all support it. Browsers try HTTP/2 first, and if the server advertises HTTP/3 support via the alt-svc header, the browser switches to HTTP/3 on the next request. You can check any site at https://http3check.net.

Side-by-Side Comparison — All Versions

Here is the full comparison across every metric that matters for web performance. Keep this table bookmarked — it shows up in system design interviews constantly.

Feature	HTTP/1.0	HTTP/1.1	HTTP/2	HTTP/3
Year / RFC	1996 / RFC 1945	1997 / RFC 2068	2015 / RFC 7540	2022 / RFC 9114
Transport	TCP	TCP	TCP	QUIC (over UDP)
Connection setup	1 RTT (TCP) + 2 RTT (TLS 1.2)	1 RTT (TCP) + 2 RTT (TLS 1.2)	1 RTT (TCP) + 1 RTT (TLS 1.3)	1 RTT (combined) or 0-RTT (resumption)
Multiplexing	No	No (pipelining broken)	Yes (stream IDs)	Yes (stream IDs)
HOL blocking	HTTP-layer	HTTP-layer	TCP-layer	None (per-stream)
Header compression	None	None	HPACK (85-95%)	QPACK (85-95%)
Server push	No	No	Yes (deprecated 2022)	Yes (rarely used)
Connection migration	No	No	No	Yes (Connection ID)
Format	Text	Text	Binary frames	Binary frames
Connections needed	1 per request	6 per domain (hack)	1 per domain	1 per domain

Interview tip: When asked "Compare HTTP/1.1, HTTP/2, and HTTP/3," focus on three things: (1) multiplexing (serial vs. concurrent), (2) head-of-line blocking (which layer it happens at), and (3) connection setup cost (RTTs). These three differences explain 90% of the performance gap between versions.

Section 7

Going Deeper — The Internals That Interviewers Love

The previous sections covered what each version does. This section covers how it does it — the wire-level details that come up in senior-level system design interviews. Each topic below is a common follow-up question after you explain the HTTP versions.

Think First HTTP/2 multiplexes all requests on one TCP connection. HTTP/1.1 uses 6 parallel connections. If a single TCP packet is lost, HTTP/2 freezes ALL streams on that one connection. HTTP/1.1 only freezes the one connection that lost the packet — the other 5 keep flowing. In what network conditions would HTTP/1.1 actually outperform HTTP/2? Think about packet loss rates.

1. Binary Framing Layer — Frames, Streams, Messages

In HTTP/1.1, the protocol is text-based. You can literally read the bytes on the wire: GET / HTTP/1.1\r\nHost: example.com\r\n\r\n. This is human-friendly but machine-hostile — the parser has to scan for line endings, handle variable-length headers, and deal with ambiguous whitespace.

HTTP/2 replaces this with a binary framing layer. Every piece of data is wrapped in a fixed-format frameThe smallest unit of communication in HTTP/2. Every frame has a 9-byte header containing: length (3 bytes), type (1 byte), flags (1 byte), and stream identifier (4 bytes). The body follows. This fixed structure makes parsing extremely fast — no scanning for delimiters.. Think of frames like shipping containers — each one is a standard size, has a label on the front, and can be loaded/unloaded by machines without human inspection.

The three key concepts:

Frame — the smallest unit. A frame has a 9-byte header (length, type, flags, stream ID) followed by the payload. The main types are:

HEADERS frame — carries compressed HTTP headers (method, path, status, cookies)
DATA frame — carries the response body (the actual HTML, CSS, image bytes)
SETTINGS frame — negotiates connection parameters (max streams, window size)
WINDOW_UPDATE frame — flow control (tells the sender "you can send more data now")
PUSH_PROMISE frame — server push announcement (deprecated in most browsers)
RST_STREAM frame — cancels a single stream without killing the connection

Stream — a bidirectional flow of frames sharing the same stream ID. One stream = one request-response pair. Streams are numbered: odd numbers are client-initiated (requests), even numbers are server-initiated (push). Stream 0 is the control stream for connection-level frames like SETTINGS.

Message — a complete HTTP request or response, consisting of one HEADERS frame followed by zero or more DATA frames. The browser reassembles interleaved frames by grouping them by stream ID.

Why binary? Text parsing is surprisingly expensive. HTTP/1.1 parsers have to handle edge cases like headers split across TCP packets, optional whitespace, and line folding. The binary frame format eliminates all ambiguity — the parser reads exactly 9 bytes, knows the frame length, and jumps to the next frame. This makes HTTP/2 parsers roughly 5x faster than HTTP/1.1 parsers.

2. HPACK vs QPACK — Header Compression Deep Dive

Both HPACK (HTTP/2) and QPACK (HTTP/3) solve the same problem: HTTP headers are repetitive and bloated. But they do it differently because of a fundamental constraint — HPACK requires ordered delivery, and QUIC does not guarantee that.

How HPACK works (three mechanisms):

1. Static table. HPACK includes a pre-defined table of 61 common header entries that never change. Entry 2 is :method GET, entry 17 is accept-encoding: gzip, deflate, and so on. When the browser sends GET, it just sends the number 2. Both sides have the same table built in — no negotiation needed.

2. Dynamic table. For headers NOT in the static table (like your specific cookie value), HPACK adds them to a shared dynamic table. The first time you send cookie: session=abc123, it gets stored as, say, index 62. Subsequent requests send 62 instead of the full string. The server maintains an identical dynamic table, so it knows what 62 means.

3. Huffman coding. For header values that must be sent in full (like the first occurrence of a cookie), HPACK uses Huffman codingA compression technique where common characters get short bit sequences and rare characters get longer ones. In HPACK's Huffman table, the letter 'e' is encoded as 5 bits instead of 8, while rare characters like '~' use 15 bits. On average, ASCII text compresses by 25-40%. to compress the string further. Common letters like 'e' and 'a' use fewer bits than rare characters like '~'. This typically saves 25-40% on new header values.

Why QPACK exists: HPACK's dynamic table has a problem. When the encoder adds entry 62, the decoder must process that addition before it can decode any frame referencing index 62. In HTTP/2 over TCP, frames arrive in order, so this works fine. But in HTTP/3 over QUIC, frames can arrive out of order — a frame referencing index 62 might arrive before the frame that created index 62. QPACK solves this by using a separate unidirectional stream for table updates, with explicit acknowledgments before new indices can be referenced.

Feature	HPACK (HTTP/2)	QPACK (HTTP/3)
Static table entries	61	99 (expanded)
Dynamic table	Shared, implicit sync	Shared, explicit ack stream
Ordering requirement	Frames must arrive in order	Works with out-of-order delivery
Huffman coding	Yes (same table)	Yes (same table)
Compression ratio	85-95%	85-95% (similar)
HOL-blocking risk	None (TCP guarantees order)	Mitigated (ack-based sync)

Real-world numbers: Google measured that HPACK reduces header overhead from an average of 800 bytes per request to 30-50 bytes after the first request on a connection. For a page with 80 requests, that is ~60KB saved — meaningful on mobile networks where every kilobyte counts.

3. Server Push — A Great Idea That Failed in Practice

The concept of server push is elegant: when the browser requests index.html, the server knows the browser will need style.css and app.js next (because they're referenced in the HTML). Instead of waiting for the browser to parse the HTML, discover the CSS link, and send a new request, the server proactively pushes those files alongside the HTML response.

The mechanism uses a PUSH_PROMISE frame: the server announces "I'm going to send you style.css on stream 2" before actually sending it. The browser can cancel the push with a RST_STREAM if it doesn't want the file (for example, if it already has it cached).

There's also a simpler approach using HTTP headers:

Terminal — Server push via Link header

# Server response with push hints
HTTP/2 200
content-type: text/html
link: </style.css>; rel=preload; as=style    # ← "Push this too"
link: </app.js>; rel=preload; as=script       # ← "And this"

# The server sends style.css and app.js BEFORE the browser asks
# Saves 1 full RTT per pushed resource

Why it failed: In practice, server push had two fatal flaws:

1. Cache ignorance. The server does not know what the browser has cached. If a user visited the site 5 minutes ago and already has style.css in their cache, the server pushes it again anyway — wasting bandwidth. The browser can send RST_STREAM to cancel, but by then the server has already started sending data. On high-latency connections, significant bytes are wasted before the cancel arrives.

2. Complexity for marginal gain. Getting push right required careful tuning — push too much and you waste bandwidth; push too little and you don't see benefits. Most CDNs and web servers never implemented it well. The performance gain (saving 1 RTT) was often offset by the wasted bytes from pushing cached resources.

Chrome removed server push support in April 2022 (Chrome 106). Firefox followed. The replacement is 103 Early Hints — a simpler mechanism where the server sends a 103 status code with preload hints while it's still generating the full response. The browser uses those hints to start fetching resources without the server actually sending them:

103 Early Hints — the replacement for server push

# Step 1: Server sends 103 immediately (while still computing the page)
HTTP/2 103 Early Hints
link: </style.css>; rel=preload; as=style
link: </app.js>; rel=preload; as=script

# Step 2: Browser starts fetching style.css and app.js immediately
# (respecting cache — if already cached, no fetch needed!)

# Step 3: Server finishes computing and sends the real response
HTTP/2 200 OK
content-type: text/html
...

# Result: same time savings, but cache-aware and simpler

Interview context: If asked about server push, explain the concept, why it was designed, and why it failed. Then mention 103 Early Hints as the replacement. This shows you understand both the theory and the real-world engineering tradeoffs.

4. 0-RTT and Replay Attacks — Speed vs. Security

QUIC's 0-RTT resumption is one of its biggest selling points. When you revisit a website, QUIC remembers the server's encryption parametersSpecifically, the server's public key and the negotiated cipher suite from the previous connection. QUIC stores these in a "session ticket" on the client. On reconnection, the client uses this ticket to encrypt the very first packet — no handshake needed. from your last visit and can send encrypted data in the very first packet. Zero round trips before data flows. On a 150ms RTT connection, this saves a noticeable 150ms.

But 0-RTT has a security risk: replay attacks. Here's the scenario:

Imagine you send a 0-RTT request that says "transfer $100 to Alice." An attacker sitting on the network captures that encrypted packet. They can't decrypt it (the encryption is solid), but they don't need to — they can simply replay the exact same encrypted packet to the server. The server decrypts it, sees a valid request from you, and transfers another $100 to Alice. The attacker just doubled the transaction by re-sending your own packet.

The mitigation is straightforward: only allow idempotentAn operation is idempotent if doing it twice produces the same result as doing it once. GET is idempotent (fetching a page twice gives you the same page). POST is usually not idempotent (submitting a form twice might create two records). requests in 0-RTT. That means GET, HEAD, and OPTIONS are fine — replaying a GET just fetches the same page twice, which is harmless. POST, PUT, and DELETE must wait for the full 1-RTT handshake, where the server assigns a unique nonce that prevents replay.

Servers can also implement additional protections: single-use session tickets (a ticket can only be used for 0-RTT once), short ticket lifetimes (expire after seconds, not hours), and application-level replay detection (check if a request was already processed).

Critical for interviews

If you mention 0-RTT in an interview, the follow-up question is almost always "What about replay attacks?" Knowing the answer — only idempotent methods, single-use tickets, short lifetimes — shows that you understand the security implications of protocol optimizations, not just the performance benefits.

Section 8

Variations — Protocols Built on Top of HTTP

HTTP is a request-response protocol — the client asks, the server answers. But many real-world applications need different communication patterns: continuous streams of data, real-time bidirectional chat, or compact binary serialization for microservice-to-microservice calls. Three protocols extend or replace HTTP for these use cases. Understanding when to use which is a staple of system design interviews.

Think First Your microservice architecture has 50 services. Service A calls Service B 10,000 times per second. Each call sends a JSON payload averaging 2 KB, and the response averages 5 KB. That is 70 MB/sec of raw JSON. If you switched to Protocol Buffers (60-70% smaller), how much bandwidth would you save per second? Per day? What about header compression savings if you use HTTP/2 with HPACK?

gRPC — HTTP/2 Binary Protocol for Microservices

gRPCGoogle Remote Procedure Call — an open-source RPC framework that uses HTTP/2 for transport and Protocol Buffers for serialization. Created by Google in 2015, it is the de facto standard for microservice-to-microservice communication at companies like Google, Netflix, Slack, Square, and Lyft. is a framework built by Google for making remote procedure calls between services. Think of it as a turbocharged version of REST APIs — instead of sending JSON over HTTP/1.1, gRPC sends compact binary data over HTTP/2. The key pieces:

Protocol Buffers (protobuf). Instead of writing JSON like {"name": "Alice", "age": 30} (27 bytes), you define a schema in a .proto file and protobuf encodes the same data in roughly 8-10 bytes — 60-70% smaller. The schema also generates type-safe client and server code automatically in any language (Go, Java, Python, C#, etc.).

Four communication patterns:

Terminal — Testing a gRPC service with grpcurl

# grpcurl is like curl but for gRPC services
# List available services
$ grpcurl -plaintext localhost:50051 list
greet.GreetService
health.HealthService

# Call a unary RPC
$ grpcurl -plaintext -d '{"name": "Alice"}' localhost:50051 greet.GreetService/SayHello
{
  "message": "Hello, Alice!"
}

# Server streaming — watch prices flow in real time
$ grpcurl -plaintext -d '{"symbol": "GOOGL"}' localhost:50051 stock.StockService/StreamPrices
{"price": 141.23, "timestamp": "2024-01-15T10:00:01Z"}
{"price": 141.45, "timestamp": "2024-01-15T10:00:02Z"}
{"price": 141.38, "timestamp": "2024-01-15T10:00:03Z"}
... (keeps streaming until cancelled)

When to use gRPC vs REST: Use gRPC for internal service-to-service calls where performance matters (10-100x faster serialization than JSON), you control both client and server, and you want auto-generated type-safe clients. Use REST for public-facing APIs (browsers can't call gRPC directly without a proxy), third-party integrations, and simple CRUD operations where JSON readability is valued.

WebSocket — Full-Duplex Communication

HTTP is a request-response protocol — the client always speaks first, and the server can only reply. But what about applications where the server needs to send data to the client at any time, without the client asking? Think chat apps, live sports scores, collaborative editing (Google Docs), or multiplayer games. The server needs to push data the instant something happens.

WebSocketA protocol (RFC 6455) that provides full-duplex, bidirectional communication over a single TCP connection. It starts as a regular HTTP request, then "upgrades" to a persistent WebSocket connection where both sides can send messages at any time. solves this by starting as a normal HTTP request and then upgrading the connection to a different protocol entirely. After the upgrade, both the client and server can send messages at any time — no more request-response restriction. The connection stays open until either side explicitly closes it.

Terminal — Testing WebSocket with wscat

# wscat is a command-line WebSocket client
$ npm install -g wscat

# Connect to a WebSocket echo server
$ wscat -c wss://echo.websocket.org
Connected (press CTRL+C to quit)

> Hello, WebSocket!
< Hello, WebSocket!          # ← Server echoes back instantly

> {"action": "subscribe", "channel": "stocks"}
< {"status": "subscribed", "channel": "stocks"}
< {"price": 141.23}          # ← Server pushes data WITHOUT a request
< {"price": 141.45}          # ← And again...

# The connection stays open — server can send anytime
# No polling, no long-polling, no hacks

Important clarification: WebSocket is NOT HTTP/2. It is a completely separate protocol (RFC 6455) that uses HTTP only for the initial handshake. After the upgrade, the connection speaks the WebSocket protocol, which has its own binary frame format. WebSocket predates HTTP/2 (WebSocket was standardized in 2011, HTTP/2 in 2015) and solves a different problem — HTTP/2 is about efficient request-response multiplexing, while WebSocket is about persistent bidirectional messaging.

When to use WebSocket: Use it when the server needs to push data to the client in real time without the client asking — chat, notifications, live dashboards, collaborative editing, multiplayer games. Do NOT use it for regular page loads or API calls — HTTP/2 is better for those because it has built-in multiplexing, compression, and caching that WebSocket lacks.

Server-Sent Events (SSE) — One-Way Stream Over HTTP

Sometimes you don't need full bidirectional communication. Many real-time features are actually one-directional — the server sends updates to the client, but the client rarely sends anything back. Think live sports scores, stock tickers, build log streaming, or notification feeds. For these cases, Server-Sent EventsA simple standard (part of the HTML5 spec) for server-to-client streaming over a regular HTTP connection. The server sends text events in a specific format, and the browser's EventSource API handles connection management, auto-reconnection, and event parsing automatically. (SSE) is simpler and more appropriate than WebSocket.

SSE is beautifully simple. It is just a normal HTTP response with Content-Type: text/event-stream that never ends. The server keeps writing events in a specific text format, and the browser's built-in EventSource API handles everything — including automatic reconnection if the connection drops (WebSocket does NOT auto-reconnect; you have to code that yourself).

Terminal — Watching SSE events with curl

# -N disables buffering so events appear in real time
$ curl -N https://example.com/events

# SSE format — simple text, human-readable
event: score-update
data: {"match": "Arsenal vs Chelsea", "score": "2-1"}

event: score-update
data: {"match": "Arsenal vs Chelsea", "score": "2-2"}

event: notification
data: {"text": "Half-time: Arsenal 2-2 Chelsea"}

# Events keep flowing until connection closes
# If the connection drops, the browser reconnects automatically
# with a Last-Event-ID header so the server can resume from where it left off

Modern trend: SSE is making a comeback. AI chat interfaces (like ChatGPT's streaming responses) use SSE to stream tokens to the browser as the model generates them. The text/event-stream format is perfect for this — each token arrives as an event, and the browser renders it incrementally. If you have built or used any LLM-powered chat interface, you have used SSE.

Section 9

At Scale — Who Actually Built This Stuff

Protocol specs are written in conference rooms, but they prove themselves in production. Four companies pushed HTTP forward by shipping real code to billions of users — and the data they collected shaped the standards we use today. These aren't academic case studies; they're the reason your browser speaks HTTP/2 and HTTP/3 right now.

Google — SPDY to HTTP/2 (The Experiment That Changed the Web)

In 2009, Google's Mike Belshe and Roberto Peon were staring at Chrome's performance data and asking a simple question: why are web pages still this slow? TCP connections were being opened and closed constantly. Headers were being sent over and over in plain text. One slow resource blocked everything else. So they built an experimental protocol called SPDY (pronounced "speedy") and baked it directly into Chrome.

The results were dramatic: 55% faster page loads in controlled tests. SPDY introduced multiplexing (many requests on one connection), header compression, and server push — ideas that sounded exotic in 2009 but are standard today. Chrome shipped SPDY support in 2012. Firefox and Opera followed. By the time the IETF working group sat down to design HTTP/2, they didn't start from scratch — they started from SPDY. The final RFC 7540 (published May 2015) is essentially SPDY with the rough edges polished off. Google deprecated SPDY in 2016 once HTTP/2 had taken over.

The pattern: Google builds it → ships it in Chrome → proves it at scale → donates the design to the IETF → the world standardizes it. They did this with SPDY (→ HTTP/2), with QUIC (→ HTTP/3), and with Brotli (→ Content-Encoding: br). When one company controls both the browser and the server fleet, they can run protocol experiments that nobody else can.

Cloudflare — First Major CDN to Ship HTTP/3

Cloudflare flipped the switch on HTTP/3 support in September 2019 — making them the first major CDN to offer it. Their edge servers sit between your browser and origin servers, and they serve roughly 20% of all websites. That means any site on Cloudflare automatically got HTTP/3 capability without changing a single line of code.

Today, about 25% of all web traffic uses HTTP/3. But the gains aren't evenly distributed. Cloudflare's data shows that users on high-latency, lossy networks — mobile users in India, Africa, Southeast Asia — see the biggest improvements. That makes sense: QUIC's 0-RTT connection resumption and per-stream loss recovery matter most when the network is unreliable. A user on fiber in San Francisco barely notices the difference; a user on 3G in Lagos sees pages load noticeably faster.

Why CDNs matter for protocol adoption: You don't need to wait for every origin server to upgrade. If the CDN speaks HTTP/3 to the browser and HTTP/2 (or even 1.1) to the origin, the user gets the benefit immediately. This is why protocol adoption moves faster than you'd expect — CDNs act as protocol translators.

Akamai — The Data That Killed Server Push

Akamai handles about 30% of all global web traffic. When they added HTTP/2 server push support, they tracked every push across their entire network. The results were sobering: only 1% of server pushes were actually useful. The other 99% were wasted bandwidth — pushing resources the browser already had in cache.

The problem is fundamental: the server doesn't know what the browser already has. You push style.css, but the browser cached it yesterday. You push a font file, but the user's system has it locally. Without a mechanism to say "I already have this, don't send it," server push is a guessing game — and servers guess wrong almost every time. Akamai's data was one of the key reasons Chrome removed server push support entirely in 2022. The replacement? 103 Early Hints — which tells the browser what to fetch rather than force-feeding it bytes.

Meta — Custom QUIC for 3 Billion Mobile Users

Facebook didn't just adopt QUIC — they built their own implementation called mvfst (pronounced "move fast," naturally). Every request from the Facebook, Instagram, and WhatsApp mobile apps now runs through mvfst. With roughly 3 billion monthly mobile users, they're running one of the largest QUIC deployments on the planet.

The results across their mobile fleet: 6% fewer request errors and 20% better tail latency (that's the p99 — the slowest 1% of requests). That might sound modest in percentage terms, but at Facebook's scale, 6% fewer errors means millions fewer failed API calls per day. The tail latency improvement means the unluckiest users (bad signal, congested cell tower, switching from Wi-Fi to cellular) have a noticeably better experience. Meta open-sourced mvfst on GitHub, and it's become a reference implementation for anyone building custom QUIC stacks.

Section 10

Anti-Lessons — Things That Sound True But Aren't

Every time a new protocol version ships, a wave of blog posts and conference talks oversimplify the story. "Just upgrade and everything is faster!" "HTTP/3 is always better!" These half-truths sound reasonable, but they lead to bad decisions in production. Here are three claims that experienced engineers hear all the time — and why they fall apart when you measure.

"Just Upgrade to HTTP/2 and Everything Is Faster"

Not always. HTTP/2 adds overhead that HTTP/1.1 doesn't have: binary framing (every message gets wrapped in frames), HPACK compression state (both sides maintain header tables), and connection management complexity. If your site serves a handful of resources — say, a single-page API with one HTML file, one CSS file, and one JS bundle — that overhead can actually make things slower than plain HTTP/1.1.

HTTP/2 wins big when there are many parallel requests on the same connection (50+ resources on a page load). For a minimalist API that serves JSON responses, the multiplexing benefit is negligible because there's usually only one request in flight at a time.

The fix: Measure before and after. Don't assume.

# Compare HTTP/1.1 vs HTTP/2 for the same resource
curl -w "HTTP/1.1: %{time_total}s\n" --http1.1 -so /dev/null https://example.com
curl -w "HTTP/2:   %{time_total}s\n" --http2   -so /dev/null https://example.com

"HTTP/1.1 Optimizations Still Help with HTTP/2"

Wrong — they actively hurt. This is probably the most common migration mistake. Developers spent years perfecting HTTP/1.1 performance hacks, and they're reluctant to remove them. But every single one of these hacks is counterproductive with HTTP/2:

Domain sharding — You split resources across img1.cdn.com, img2.cdn.com, img3.cdn.com to get around the 6-connections-per-domain limit. With HTTP/2, all resources multiplex on one connection. Sharding forces the browser to open multiple connections, defeating the entire point. Each extra domain also requires a separate TLS handshake and DNS lookup.

CSS sprites — You combined dozens of icons into one big image to reduce requests. With HTTP/2, each icon can be its own file and they all download in parallel. Sprites force users to download unused pixels and make your CSS more complex for no benefit.

JS/CSS concatenation — You merged all JavaScript into one giant file so there's only one request. With HTTP/2, the browser can fetch many small files just as efficiently. Worse, one giant bundle means the entire thing must be re-downloaded when any part changes, and it blocks parsing until the whole file arrives.

Migration rule: When upgrading to HTTP/2, remove all HTTP/1.1 hacks. They're not just unnecessary — they make HTTP/2 perform worse than it should.

"HTTP/3 Is Always Better Than HTTP/2"

Not yet. HTTP/3 runs on QUIC, which runs on UDP. And here's the problem: a significant chunk of the internet's infrastructure — corporate firewalls, hotel Wi-Fi captive portals, some ISPs, government networks — blocks UDP traffic that isn't DNS. These middleboxes were built in an era when "UDP from a browser" meant something suspicious.

Estimates suggest 3-5% of networks drop QUIC packets entirely. That doesn't sound like much until you do the math: 3% of 5 billion internet users is 150 million people who can't use HTTP/3 at all. This is why every HTTP/3 deployment must include HTTP/2 fallback.

The mechanism is the Alt-Svc (Alternative Service) header. The server first responds over HTTP/2, and includes a header like:

Alt-Svc: h3=":443"; ma=86400

This tells the browser: "I also speak HTTP/3 on port 443, and you can remember this for 86,400 seconds (24 hours)." The browser will try HTTP/3 on the next request. If UDP is blocked, it silently falls back to HTTP/2. The user never notices — but you need both protocols running on your server.

Section 11

Common Mistakes — Things That Bite You in Production

These aren't hypothetical — they're the mistakes that show up in post-mortems, Hacker News threads, and 3 AM Slack messages. Every one of them has a simple fix, but you have to know they exist first. If you're running a web server or building APIs, scan this list and check whether any apply to you right now.

Think First Your web server is running nginx with default settings. You run curl -v https://yoursite.com and see HTTP/1.1 200 OK instead of HTTP/2. You have a valid TLS certificate. What is the most likely configuration issue? What single nginx directive would you add to fix it?

Not Enabling HTTP/2 on Your Server

HTTP/2 has been standardized since 2015, yet a surprising number of servers still serve HTTP/1.1 by default. The issue is that HTTP/2 requires TLS in practice (browsers refuse plaintext HTTP/2), so if you haven't set up HTTPS, you're stuck on 1.1 no matter what.

Enabling it is usually one line of config:

# nginx.conf — add 'http2' to the listen directive
server {
    listen 443 ssl http2;
    ssl_certificate     /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
}

# httpd.conf — enable the http2 module
LoadModule http2_module modules/mod_http2.so
Protocols h2 http/1.1

Verify it's working: curl -sI --http2 https://yoursite.com | grep -i "^HTTP" — you should see HTTP/2 200.

Still Doing Domain Sharding with HTTP/2

We covered this in S10, but it deserves its own callout because it's so common. If you migrated to HTTP/2 but left your domain sharding in place (img1.cdn.com, img2.cdn.com, etc.), you're actively making things worse. The browser opens a separate HTTP/2 connection to each domain, which means you lose the multiplexing benefit and pay extra DNS + TLS costs for each shard.

Fix: Consolidate all resources onto a single domain (or two at most — one for your origin, one for your CDN). Let HTTP/2 multiplexing handle the parallelism.

Setting max-age Too High on CSS/JS

Setting Cache-Control: max-age=31536000 (one year) sounds great for performance — the browser never re-fetches the file. But when you deploy a new version of app.js, users are stuck with the old one until the cache expires. They won't even know there's a new version.

Fix: Use content-hashed filenames. Instead of app.js, serve app.a3f8b2c.js. The hash changes when the content changes, so the browser treats it as a new file. Now you CAN set max-age=31536000 safely because the filename itself is the cache-buster. Every modern build tool (Webpack, Vite, esbuild) does this automatically.

Not Compressing Responses (Brotli > gzip)

Every text-based response (HTML, CSS, JS, JSON, SVG) should be compressed. Brotli (Content-Encoding: br) is the modern standard — it produces files 15-25% smaller than gzip at similar CPU cost. All modern browsers support it over HTTPS.

Check whether your server compresses responses:

# Request with Brotli support, check response encoding
curl -sI -H "Accept-Encoding: br, gzip" https://yoursite.com | grep -i content-encoding
# Should show: content-encoding: br

If you see no Content-Encoding header, your responses are being sent uncompressed — potentially 3-5x larger than they need to be.

Ignoring TTFB (Time To First Byte)

TTFBTime To First Byte — the time from when the browser sends a request to when it receives the very first byte of the response. It includes DNS lookup, TCP connect, TLS handshake, and server processing time. A good TTFB is under 200ms; above 600ms is a problem. measures how long the browser waits before the server even starts sending data. It captures everything: DNS, TCP, TLS, server processing, and network transit. Developers obsess over front-end rendering speed but ignore TTFB, which is often the real bottleneck.

# Measure TTFB for any URL
curl -w "DNS: %{time_namelookup}s | TCP: %{time_connect}s | TLS: %{time_appconnect}s | TTFB: %{time_starttransfer}s | Total: %{time_total}s\n" -so /dev/null https://yoursite.com

If TTFB is high but DNS/TCP/TLS are fast, the problem is server-side processing. If DNS is slow, switch resolvers. If TLS is slow, check your certificate chain (too many intermediate certs).

Not Using 103 Early Hints for Critical Resources

103 Early Hints is the replacement for server push. Instead of force-feeding bytes to the browser (which fails 99% of the time, per Akamai's data), the server sends a lightweight informational response that says "while I'm generating the real page, start fetching these resources":

HTTP/1.1 103 Early Hints
Link: </style.css>; rel=preload; as=style
Link: </app.js>; rel=preload; as=script

HTTP/1.1 200 OK
Content-Type: text/html
...actual page content...

The browser receives the 103 response instantly (before the server has finished generating HTML), starts fetching CSS and JS in parallel, and by the time the 200 response arrives, those critical resources are already downloading. Chrome, Firefox, and Cloudflare all support this. If you're still relying on <link rel="preload"> in the HTML head, you're too late — the browser has to parse the HTML first before it discovers those preload hints. 103 Early Hints arrives before the HTML even exists.

Section 12

Interview Playbook — "Explain HTTP/1.1 vs HTTP/2 vs HTTP/3"

This question shows up in system design rounds, backend interviews, and even frontend interviews. The interviewer isn't looking for a memorized spec — they want to see that you understand why each version exists, what problem it solved, and what trade-offs it introduced. Your answer depth should match the role level. Here's a cheat sheet for each level:

What to say: "HTTP/1.1 keeps the connection open between requests, but it can only handle one request at a time per connection. So browsers open 6 parallel connections to work around that. HTTP/2 fixed this with multiplexing — you get one connection that handles many requests simultaneously using streams. HTTP/3 switched from TCP to a new protocol called QUIC built on UDP, which eliminates a problem called head-of-line blocking at the transport layer."

What to know:

HTTP/1.1 → persistent connections, but one-at-a-time per connection
HTTP/2 → binary protocol, multiplexed streams, header compression
HTTP/3 → QUIC (UDP-based), 0-RTT, no TCP HOL blocking
All three are backwards-compatible (servers negotiate the best version)

Bonus points: Mention that you can see which version is being used in Chrome DevTools (Network tab → Protocol column). This shows the interviewer you've actually looked at this in practice, not just read about it.

What to say: "The core problem each version solves is latency from unnecessary round trips. HTTP/1.1 had head-of-line blocking at the application layer — responses had to be returned in order. HTTP/2 fixed that with binary framing: each request/response pair gets a stream ID, and frames from different streams can interleave on the same connection. But HTTP/2 still runs on TCP, which has its own head-of-line blocking — if one TCP packet is lost, the kernel holds up ALL streams until it's retransmitted. HTTP/3 solves this by replacing TCP with QUIC, where each stream has independent loss recovery."

Key details to mention:

HPACK — HTTP/2's header compression uses a static table (61 common headers) plus a dynamic table built during the connection. This removes the redundancy of sending the same headers (cookies, user-agent) on every request.
Server Push failed — It was in the HTTP/2 spec but only 1% of pushes were useful (Akamai data). Chrome removed it. The replacement is 103 Early Hints.
TLS is mandatory in practice — The HTTP/2 spec allows plaintext (h2c), but no browser supports it. Real-world HTTP/2 always means TLS.

What to say: "The most interesting architectural decision in HTTP/3 is moving the transport protocol to user space. QUIC implements congestion control, loss recovery, and connection management in the application layer rather than the kernel. This means protocol upgrades don't require OS kernel updates — you deploy a new QUIC version like you deploy a new app version. Google uses this to iterate on congestion control algorithms like BBR (Bottleneck Bandwidth and Round-trip propagation time) without waiting for Linux kernel releases."

Advanced topics to discuss:

0-RTT security trade-offs — 0-RTT data is vulnerable to replay attacks. An attacker can capture the initial flight and replay it. This is why 0-RTT should only carry idempotent requests (GET, not POST). Servers must implement replay protection (strike registers, or limiting 0-RTT to safe requests).
BBR on QUIC — Traditional TCP uses loss-based congestion control (Cubic/Reno): when a packet is lost, assume congestion and back off. BBR instead measures the actual bandwidth and RTT of the path, and targets operating at the bandwidth-delay product. This is especially effective on networks where packet loss isn't caused by congestion (wireless, mobile).
Connection migration — TCP connections are identified by the 4-tuple (src IP, src port, dst IP, dst port). Change any one and the connection breaks. QUIC uses a Connection ID instead — so when your phone switches from Wi-Fi to cellular (IP changes), the QUIC connection survives.
Deployment reality — The Alt-Svc header mechanism means HTTP/3 is always opt-in. First connection is HTTP/2, browser learns HTTP/3 is available, subsequent connections try QUIC. You need both stacks running. Firewalls that block UDP (common in enterprises) force permanent HTTP/2 fallback.

Section 13

Practice Exercises — Hands-On with HTTP

Reading about HTTP versions is one thing. Actually seeing the differences with your own eyes is what makes the knowledge stick. These exercises go from "open a browser tab" to "capture raw protocol frames." Pick the level that matches your experience and work your way up.

Exercise 1: Check Which HTTP Version Your Favorite Sites Use Beginner

Open Chrome DevTools (F12), go to the Network tab, and right-click any column header to enable the Protocol column. Now reload the page. You'll see h2 (HTTP/2) or h3 (HTTP/3) next to each request. Try it on google.com, youtube.com, github.com, and a small personal blog. Are they all on the same version? Which resources use which version?

Time target: 5 minutes

What You Should See

Google and YouTube will show h3 — they pioneered QUIC. GitHub typically shows h2. Small blogs may still show http/1.1 if they haven't enabled HTTP/2 on their server. If you see a mix on one site (some resources h2, some h3), it means the browser is negotiating per-connection — likely the CDN supports HTTP/3 but the origin server doesn't.

Exercise 2: Compare Load Times with curl Intermediate

Use curl to request the same resource using different HTTP versions and compare the timing. Pick a resource-heavy page (a news site homepage works well):

# HTTP/1.1
curl -w "HTTP/1.1 → DNS: %{time_namelookup} | Connect: %{time_connect} | TTFB: %{time_starttransfer} | Total: %{time_total}\n" --http1.1 -so /dev/null https://www.cloudflare.com

# HTTP/2
curl -w "HTTP/2   → DNS: %{time_namelookup} | Connect: %{time_connect} | TTFB: %{time_starttransfer} | Total: %{time_total}\n" --http2 -so /dev/null https://www.cloudflare.com

Run each command 5 times and average the results. For a single resource, the difference is often small. The real win with HTTP/2 comes from page loads with many resources — which curl doesn't do (it fetches one URL). This exercise teaches you what curl can and can't measure.

Time target: 10 minutes

Exercise 3: Explore the HPACK Static Table Intermediate

HTTP/2's header compression (HPACK) has a static table of 61 pre-defined header name-value pairs — defined in RFC 7541 Appendix A. Find the table and answer these questions:

Which header is at index 1? (Hint: it's a pseudo-header)
How many different status codes are pre-defined? Which ones?
Is content-encoding: gzip in the table? What about content-encoding: br?
Why do you think these specific 61 entries were chosen?

Time target: 15 minutes

Answers

Index 1 is :authority (the pseudo-header for the host). The table includes status codes 200, 204, 206, 304, 400, 404, 500. content-encoding: gzip is in the table (index 26), but br is NOT — Brotli was standardized after HPACK. The entries were chosen by analyzing the most common headers across real web traffic — Google measured what appeared most frequently and those became the static table. QPACK (HTTP/3's equivalent) adds more entries based on newer traffic patterns.

Exercise 4: Set Up Nginx with HTTP/2 Advanced

Spin up a local Nginx server with HTTP/2 enabled. You'll need a TLS certificate (use a self-signed cert for local testing). Write the full config from scratch — don't copy-paste a tutorial. Key steps:

Generate a self-signed cert: openssl req -x509 -newkey rsa:2048 -nodes -keyout key.pem -out cert.pem -days 365
Write an nginx.conf with listen 443 ssl http2
Create a test HTML page with 20+ small images (to see multiplexing in action)
Verify with: curl -k --http2 -I https://localhost (the -k flag ignores the self-signed cert warning)

Time target: 30 minutes

Minimal Working Config

worker_processes 1;
events { worker_connections 128; }

http {
    server {
        listen 443 ssl http2;
        server_name localhost;

        ssl_certificate     /path/to/cert.pem;
        ssl_certificate_key /path/to/key.pem;

        root /var/www/html;
        index index.html;
    }
}

Exercise 5: Capture HTTP/2 Frames in Wireshark Expert

This is the deepest you can go without writing your own protocol implementation. Install Wireshark, capture traffic to an HTTP/2 site, and decode the frames:

Start Wireshark capture on your primary network interface
In Chrome, visit an HTTPS site. To decrypt the traffic, set the SSLKEYLOGFILE environment variable before launching Chrome, then point Wireshark to that log file (Edit → Preferences → Protocols → TLS → Pre-Master-Secret log filename)
Apply the display filter: http2
Identify these frame types: HEADERS (request/response headers), DATA (response body), SETTINGS (connection parameters), WINDOW_UPDATE (flow control)
Find the stream IDs — notice that odd numbers are client-initiated, even numbers are server-initiated

Time target: 45-60 minutes

Tips for Decrypting HTTPS Traffic

# Linux/Mac: Set the TLS key log file before launching Chrome
export SSLKEYLOGFILE=~/tls-keys.log
google-chrome &

# Windows (PowerShell): Set environment variable
$env:SSLKEYLOGFILE = "$HOME\tls-keys.log"
Start-Process chrome

In Wireshark, go to Edit → Preferences → Protocols → TLS and set the (Pre)-Master-Secret log filename to the same path. Now all HTTPS traffic from that Chrome session will be decryptable. You'll see the actual HTTP/2 frames instead of encrypted blobs. Look for the SETTINGS frame at the start of each connection — it shows the maximum concurrent streams, initial window size, and header table size that each side agreed to.

Section 14

Cheat Cards — Bookmark This Section

Six cards. Everything you need when you're configuring a server, debugging a slow page, or answering an interview question. All real data, all real commands.

Version Comparison

HTTP/1.0  1 req per connection
HTTP/1.1  persistent, pipelining
          (6 conn per domain)
HTTP/2    binary, multiplexed
          HPACK, server push (dead)
HTTP/3    QUIC (UDP), 0-RTT
          per-stream loss recovery

Connection Math

HTTP/1.1 page load (100 files):
  6 conn × ~17 files each
  = 17 serial round trips
  = 17 × RTT overhead

HTTP/2 page load (100 files):
  1 conn × 100 streams
  = 1 round trip (parallel)
  = massive RTT savings

Handshake Rounds

TCP only:       1 RTT
TCP + TLS 1.2:  3 RTT
TCP + TLS 1.3:  2 RTT
QUIC (new):     1 RTT (TLS built in)
QUIC (resume):  0 RTT

Debug Commands

curl --http1.1 -I URL  # force v1.1
curl --http2 -I URL    # force v2
curl --http3 -I URL    # force v3

# Timing breakdown:
curl -w "TTFB: %{time_starttransfer}
Total: %{time_total}" -so /dev/null URL

# Check compression:
curl -sI -H "Accept-Encoding: br"

Server Config

Nginx:
  listen 443 ssl http2;

Apache:
  Protocols h2 http/1.1

Node.js:
  http2.createSecureServer()

Caddy:
  HTTP/2 + H3 automatic

Key Headers

Alt-Svc: h3=":443"; ma=86400
  → tells browser H3 is available

Content-Encoding: br
  → Brotli compression (best)

Cache-Control: max-age=31536000
  → 1 year cache (use with hashes)

103 Early Hints
  → preload before HTML is ready

Section 15

Connected Topics — Where to Go Next

HTTP doesn't exist in isolation — it's one layer in a stack. Understanding TCP/UDP explains why QUIC exists. Understanding DNS explains the first step before any HTTP request. Understanding load balancers explains what happens at scale. Pick the topic that fills the biggest gap in your knowledge.

HTTP/1.0 → HTTP/1.1 → HTTP/2 → HTTP/3

TL;DR — The Delivery Truck Versions

The Scenario — Why Amazon.com Takes 237 Requests

The First Attempt — HTTP/1.0 and the One-Trip Truck

Where It Breaks — Three Problems That Forced a Redesign

Problem 1 — Connection Overhead Still Adds Up

Problem 2 — The 6-Connection Limit per Domain

Problem 3 — Head-of-Line Blocking in Pipelining

The Breakthrough — Multiplexing Changes Everything

How It Works — Version by Version

HTTP/1.1 — Keep-Alive, Chunked Transfer, Host Header

HTTP/2 — Binary Framing, Streams, HPACK, Server Push

HTTP/3 — QUIC Over UDP, 0-RTT, Connection Migration

Side-by-Side Comparison — All Versions

Going Deeper — The Internals That Interviewers Love

Variations — Protocols Built on Top of HTTP

gRPC — HTTP/2 Binary Protocol for Microservices

WebSocket — Full-Duplex Communication

Server-Sent Events (SSE) — One-Way Stream Over HTTP

At Scale — Who Actually Built This Stuff

Google — SPDY to HTTP/2 (The Experiment That Changed the Web)

Cloudflare — First Major CDN to Ship HTTP/3

Akamai — The Data That Killed Server Push

Meta — Custom QUIC for 3 Billion Mobile Users

Anti-Lessons — Things That Sound True But Aren't

Common Mistakes — Things That Bite You in Production

Interview Playbook — "Explain HTTP/1.1 vs HTTP/2 vs HTTP/3"

Practice Exercises — Hands-On with HTTP

Cheat Cards — Bookmark This Section

Connected Topics — Where to Go Next

TCP vs UDP

DNS

REST API Design

Scalability

Performance

Real-Time Communication