TL;DR โ The One-Minute Version
- What compression actually is โ finding patterns in bytes and replacing them with shorter codes
- The two big families: lossless (exact original recovery) vs lossy (close enough, humans can't tell)
- The three-way trade-off every compressor makes: ratio vs encode speed vs decode speed
- Why algorithm choice translates directly to cloud bills, latency, and CPU cost
Compression is the art of finding patterns in data and replacing them with shorter codes. The original data can be recovered from those codes โ that's lossless compression. Or you can discard tiny details humans won't miss and get even smaller files โ that's lossy. The same idea has powered the internet since 1977, but new variants keep arriving because shaving 5% off a billion daily requests means millions of dollars saved or spent.
Picture the string "Hello World Hello World Hello World". It's 35 bytes written out in full. But notice โ the phrase "Hello World" repeats three times. A compressor sees that pattern and writes it as [Hello World ร 3] โ just 15 bytes. It found the redundancy and replaced it with a compact reference. Decompression is just reading the reference and expanding it back. That's the entire idea. Everything else is refinement.
The core idea: Real-world data is almost never random โ it has patterns. The same words repeat in text. Adjacent pixels in an image are often nearly identical. Compression algorithms find those patterns and replace them with a compact shorthand. Decompression reverses the process: expand the shorthand back into the original data. The trick is finding an encoding scheme where the shorthand is always shorter than the original.
Two big families: Lossless compression reconstructs the original bytes exactly โ required for code, JSON, databases, and executables where a single flipped bit is a bug. Lossy compression discards details the human senses can't easily detect: imperceptible audio frequencies in MP3, redundant color information in JPEG. Lossy trades a little quality for dramatically smaller files, which is why a 50 MB raw audio track becomes a 4 MB MP3.
The three trade-offs: Every compressor balances three dials: compression ratio (how small does it get?), encode speed (how fast can you compress?), and decode speed (how fast can you decompress?). Shrinking more usually costs more CPU time. Brotli at level 11 gets amazing ratios but takes ~40ร longer to compress than gzip. LZ4 compresses 5ร faster than gzip but leaves files 20โ30% bigger. Choosing means knowing which dial matters most in your workload.
Why You Need This โ The Real Cost of Uncompressed Data
Here's a number that will change how you think about compression: AWS charges roughly $0.08 per GB of data transferred out to the internet. That sounds small. But at scale, it's the difference between a startup that stays profitable and one that bleeds cash.
Imagine your API returns typical JSON responses averaging 50 KB each. Your service gets 100 million requests per day. Do the math: 100M ร 50 KB = 5,000 TB transferred โ that's 5 petabytes of egress per day. At $0.08/GB, that's $400,000 per day, or $146 million per year. Enable gzip, and those 50 KB payloads compress down to roughly 12 KB. Same 100M requests: 1,200 TB per day. Bill: $96,000/day or $35M/year. That's a $111 million annual saving for one config line.
The CPU cost of compression (a few microseconds per request) is utterly trivial compared to those numbers. This is why engineers at companies like Cloudflare, Netflix, and Google treat compression as a first-class concern โ not a nice-to-have.
Your database stores 1 billion JSON documents averaging 2 KB each โ that's 2 TB total. The documents are user profiles: fields like "name", "email", "user_id", "created_at" repeat in every single document. Before reading on: what compression ratio could you realistically achieve on this data?
Reveal the answer
Text compression on repetitive JSON typically achieves a 5โ10ร ratio โ meaning 2 TB compresses to 200โ400 GB. The key reason: field names like "name", "email", "user_id" are stored literally in every document. A dictionary compressor (LZ77 and descendants) builds a shared dictionary of those repeating strings and replaces each one with a 2โ3 byte reference. The more repetition, the better the ratio. At 10ร: you save 1.8 TB of storage. At AWS EBS pricing (~$0.08/GB-month), that's ~$144,000/year saved on one table's storage alone.
The graph above uses real AWS pricing on a realistic workload. Your numbers will vary โ payloads larger or smaller, traffic higher or lower, CDN pricing different โ but the shape is always the same. Uncompressed dominates. Even gzip, a 32-year-old algorithm, cuts the bill by 75โ80% on typical JSON. Brotli shaves another 10โ15% on top of that.
Beyond cloud bills, compression also speeds up the user's experience. A 50 KB JSON response over a 4G mobile connection (typical ~10 Mbps downlink) takes ~40ms. At 12 KB, that drops to ~10ms. On a slow 3G connection (1 Mbps), 50 KB = ~400ms โ a noticeable pause. 12 KB = ~96ms โ feels instant. Compression is free latency improvement for your users.
Mental Model โ Why Patterns Compress
Compression works because most real-world data is not random. Random data โ like the output of a true random number generator โ is already as compact as it can be. There's no pattern to exploit. But almost everything we actually store or transmit is deeply patterned.
Consider a photo of a clear blue sky. Millions of pixels, but they're almost all the same shade of blue. An image with a thousand pixels of exactly RGB(100, 149, 237) doesn't need to store that triple a thousand times โ it can say "blue, 1000 times" and be vastly smaller. Or consider a text file: the word "the" appears roughly every 14 words in English. A compressor sees that repetition and replaces every occurrence after the first with a tiny back-reference. The more patterned your data, the more a compressor can save.
Different algorithms exploit different kinds of patterns. The four main pattern types โ and the algorithm families that exploit them โ are shown below.
In practice, modern compressors like zstandard and brotli layer these approaches: LZ77 handles the repeated-string redundancy first, then Huffman (or the faster ANS โ Asymmetric Numeral Systems) handles the frequency skew in the remaining output. Combining multiple techniques is why modern compressors are so much better than RLE alone.
The three axes every compressor balances:
- Compression ratio โ how small does the output get? Expressed as
compressed_size รท original_size. Smaller is better. A ratio of 0.25 means "25% of original" โ you saved 75%. - Encode speed โ how long does compressing take? This is paid once when writing or transmitting. For HTTP responses, this is paid on every request โ so encode speed matters enormously.
- Decode speed โ how long does decompressing take? This is paid every time someone reads the data. For web assets served to millions of browsers, decode speed is critical โ you can't make users' CPUs faster.
Compress-once, read-many (static web assets, ML model weights, CDN content): Pay for slow Brotli compression at deploy time. Users decode fast, forever. Best ratio + fast decode wins.
Compress-many, read-once (HTTP API responses, streaming data): You compress every request in real time. Fast encode is critical. gzip or zstd at default levels win.
Compress-once, read-once (logs in transit, event streams): Absolute fastest encode. Snappy or LZ4. Ratio matters less because the data won't be read many times.
Core Concepts โ The Vocabulary You Need
Six terms come up in every compression discussion. You'll see them in documentation, interviews, and architecture reviews. Here they are in plain English first โ then the precise definition.
Lossless Compression
The output bytes can be exactly restored to the original input โ bit for bit, byte for byte. You can compress and decompress a thousand times and get the same result every time. Required for anything where a single bit flip is catastrophic: source code, JSON APIs, database files, executables, ZIP archives, PDFs.
Examples: gzip, brotli, zstandard, LZ4, snappy, DEFLATE, bzip2, LZMA, PNG, FLAC.
Lossy Compression
The output is "close enough" โ similar to the original, but not identical. Bits that humans don't notice are discarded permanently. You can never recover the exact original. The trade-off: dramatically smaller files at the cost of some quality. For media content, "good enough" and "10ร smaller" is an excellent deal.
Examples: JPEG (images), MP3/AAC (audio), H.264/H.265/AV1 (video), WebP (images with lossy mode).
Dictionary Coding
The most important technique in modern compression. You maintain a "dictionary" of substrings seen earlier in the data. When you encounter a string you've seen before, you write a back-reference: "go to offset X and copy N bytes" โ typically 2โ3 bytes instead of the original string. LZ77 is the classic dictionary coder. Every modern general-purpose compressor (gzip, zstd, brotli, snappy, LZ4) is primarily a dictionary coder.
Entropy Coding
A second pass used on top of dictionary coding. After LZ77 reduces repetition, the remaining symbols still have some frequency skew โ some values appear more than others. Entropy coding exploits this by assigning shorter bit sequences to more common symbols. Huffman coding builds an optimal binary tree: 'e' might get a 3-bit code, 'z' a 14-bit code. The average goes below 8 bits per character. ANS (Asymmetric Numeral Systems) is the modern replacement โ faster than Huffman, similar ratio.
Block vs Streaming Compression
Block compressors gather a chunk of input, analyze it in full, then compress. They can look backward and forward across the entire block to find the best matches โ better ratio. But you have to wait for the whole block before sending the first byte. Good for files, bad for real-time streams.
Streaming compressors process bytes as they arrive. They compress immediately with only a backward-looking window. Slightly worse ratio, but much lower latency. gzip in streaming mode works this way for HTTP/1.1 chunked transfer.
Compression Ratio
The standard measure: compressed_size รท original_size. A ratio of 0.3 means the compressed version is 30% the size of the original โ you saved 70%. Often expressed as a multiplier instead: "3.3ร compression." Or as percentage saved: "70% reduction." All three mean the same thing.
Practical benchmarks: gzip on JSON: 0.15โ0.25 (75โ85% savings). gzip on already-compressed JPEG: ~0.98 (2% savings โ essentially useless). zstd on logs: 0.10โ0.20 (80โ90% savings).
A Brief History โ From Shannon to Zstandard
Every compressor you use today traces back to a small number of seminal papers. Understanding the lineage demystifies why modern formats are the way they are. It's not arbitrary โ each generation fixed a specific problem with the previous one.
The Theory Era (1948โ1980s)
Claude Shannon's 1948 paper "A Mathematical Theory of Communication" proved something profound: every data source has a theoretical minimum size called its entropy. You can never compress below it, no matter how clever your algorithm. This gave compression research a target.
In 1952, David Huffman, then a grad student, invented Huffman coding while trying to prove his professor wrong. By assigning shorter bit patterns to more frequent symbols, he got ~50% compression on English text โ near-optimal for single-symbol frequency coding.
Then in 1977, Jacob Ziv and Abraham Lempel invented LZ77 โ the sliding-window dictionary coder. Instead of coding one symbol at a time, it replaced entire substrings with back-references. This was a qualitative leap. LZ78 followed in 1978. Every major general-purpose compressor since then is a descendant.
The Deployment Era (1990s)
Phil Katz combined LZ77 + Huffman into DEFLATE for his PKZIP software, with PKZIP 2.0 (DEFLATE) shipping in 1993. It was fast, practical, and brilliantly engineered. Jean-loup Gailly and Mark Adler wrote a free, patent-unencumbered implementation of DEFLATE โ first in the gzip command-line tool (1992), then packaged as the zlib library (1995). gzip became the Unix standard and zlib became the embedded compression library used by everything from the Linux kernel to web browsers.
Why did gzip win the web? Patent-free licensing, universal OS support, and good-enough performance. HTTP/1.1 (1997) standardized the Accept-Encoding: gzip header. Every web server in existence supports it. That installed base is why gzip still handles ~70% of HTTP responses today โ not because it's the best algorithm, but because it's everywhere.
The Speed Era (2011) โ Snappy & LZ4
By 2011, Google engineers noticed that gzip was too slow for their use case: compressing data in memory between services, where the CPU cost of high-ratio compression outweighed the network savings. So they wrote Snappy with an explicit goal: be 3โ5ร faster than gzip, even if the ratio is 20โ30% worse.
LZ4, by Yann Collet (who later wrote Zstandard), took this even further. LZ4 compresses at ~500 MB/s single-core โ so fast that enabling it on RAM-to-RAM transfers can actually be a net win even when you have plenty of bandwidth. The insight: CPU time has a cost too, and sometimes speed is worth more than ratio.
The Modern Era (2013โnow) โ Brotli & Zstandard
Brotli (Google, 2013) was engineered specifically for web asset delivery. It includes a pre-built static dictionary of 13,000+ common web strings (HTML tags, CSS keywords, JavaScript boilerplate). That dictionary lets it get better ratios on web content without burning extra CPU. Browsers started shipping support in 2016; it's now the default for static asset delivery on Cloudflare, Fastly, and most CDNs.
Zstandard (Facebook, Yann Collet, 2016) is arguably the most important general-purpose compressor in production today. It achieves better ratios than gzip at the same encode speed โ essentially a free upgrade. Facebook, Linux kernel, Python distributions, and PostgreSQL all use it. The "right default" for new systems: use zstd.
Every modern general-purpose compressor โ gzip, zstd, brotli, snappy, LZ4 โ is fundamentally LZ77 + an entropy coder. The differences are in the engineering details: window size (how far back to search for matches), dictionary handling (static vs dynamic vs trained), and entropy coder choice (Huffman vs ANS). Knowing this, you can instantly understand why a "new" compressor claims to be better than gzip โ it's always a refinement on one of these three axes.
When to Compress, When Not To
Compression isn't free. It costs CPU on encode and decode. And it has one hard rule: you cannot compress already-compressed data. JPEGs, MP4 videos, and encrypted payloads are essentially random noise from the compressor's perspective โ they have no patterns to exploit. Running gzip over a JPEG will make the file larger, not smaller, because the gzip header overhead gets added with zero savings. Knowing what to compress (and what not to) is as important as knowing how to compress.
The flowchart above gives you an instant answer for 90% of decisions. Let's add concrete numbers to the five main use cases:
HTTP Responses
Algorithm: gzip (universal support) + Brotli for modern browsers.
Why: Virtually every browser and CDN supports both. Brotli gets ~10โ15% better ratio on text content than gzip, with no meaningful cost. Enable both: serve Brotli when the client supports it (Accept-Encoding: br), gzip as fallback.
Typical savings: 70โ85% on HTML/JSON/CSS/JS. Near zero on JPEG/PNG/SVG (already compressed or small).
Database Storage
Algorithm: LZ4 (default) or Zstandard (better ratio, slightly more CPU).
Why: Databases compress at page or block level. LZ4 is fast enough that decompression doesn't measurably increase query latency on modern hardware. MongoDB, PostgreSQL (via pg_compression), Cassandra, ClickHouse, and RocksDB all default to LZ4 or expose zstd as an option.
Typical savings: 3โ8ร on text-heavy tables (logs, events, JSON columns).
Backups & Archives
Algorithm: Zstandard at level 19โ22, or LZMA for maximum ratio.
Why: You compress once, store for months or years, restore rarely. Slow encode is fine. You want the smallest possible archive to minimize storage cost. Zstd at level 22 approaches bzip2/LZMA ratios while still decoding fast โ critical when you need rapid recovery during an incident.
Typical savings: 80โ90% on database dumps and log archives.
Real-Time Logs & Metrics
Algorithm: Snappy (Google's default) or LZ4.
Why: Logs are written constantly, at high throughput, and often processed only once (aggregated and discarded). Encode speed dominates. Snappy and LZ4 both achieve sub-millisecond compression of 64โ256 KB blocks. Kafka uses Snappy as its most common compression codec. The ratio (typically 4โ6ร on log data) is quite good despite the speed focus.
Already-Compressed Media
Algorithm: None. Store as-is.
Why: JPEG, PNG, MP3, MP4, WebP, HEIC, and ZIP/7z files are already compressed. Their byte distribution is close to random from a lossless compressor's perspective โ dictionary matches are rare, and Huffman coding can't help. Running gzip over a JPEG adds ~18 bytes of header and makes the file 0.1% larger. It's wasted CPU and disk write.
In-Memory / IPC Data
Algorithm: LZ4 or no compression.
Why: When data moves between processes on the same machine (shared memory, sockets, or in-memory caches like Redis), the "network" is extremely fast. Compression only helps if the CPU time to compress/decompress is less than the time saved on transfer. LZ4 at ~500 MB/s is often fast enough to be net-positive. Anything slower (gzip, zstd) usually isn't worth it for pure in-memory paths.
This is a serious security rule, not just a performance tip. In 2012, the CRIME attack (Compression Ratio Info-leak Made Easy) demonstrated that compressing data before encrypting it leaks information about the plaintext. An attacker who can inject chosen bytes and observe the resulting ciphertext size can deduce the original plaintext byte-by-byte, because matching bytes compress smaller.
In 2013, BREACH (Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext) showed the same attack works against HTTPS responses compressed with gzip. The fix: TLS 1.3 disabled stream-level compression entirely. Always encrypt first then compress the ciphertext is pointless; compress first then encrypt is dangerous. The correct order: compress โ no encryption of compressed data in TLS. Let the application layer handle compression; let TLS handle encryption. Never both in sequence on the same data.
Side-by-Side: Same JSON, Three Algorithms
Let's make the trade-offs concrete. The same JSON object โ a realistic user profile โ compressed with gzip, zstd, and brotli:
import gzip, json, time
payload = json.dumps({
"user_id": "usr_8f3a2b1c",
"name": "Alice Johnson",
"email": "alice.johnson@example.com",
"created_at": "2024-01-15T10:30:00Z",
"plan": "pro",
"preferences": {
"theme": "dark",
"language": "en",
"notifications": True
},
"tags": ["power-user", "early-adopter", "api-access"]
}).encode()
t0 = time.perf_counter_ns()
compressed = gzip.compress(payload, compresslevel=6) # default level
encode_ms = (time.perf_counter_ns() - t0) / 1e6
t0 = time.perf_counter_ns()
decompressed = gzip.decompress(compressed)
decode_ms = (time.perf_counter_ns() - t0) / 1e6
print(f"Original : {len(payload):,} bytes")
print(f"gzip lvl6 : {len(compressed):,} bytes")
print(f"Ratio : {len(compressed)/len(payload):.2f} ({100*(1-len(compressed)/len(payload)):.0f}% savings)")
print(f"Encode : {encode_ms:.3f} ms")
print(f"Decode : {decode_ms:.3f} ms")
# Output on typical hardware:
# Original : 258 bytes
# gzip lvl6 : 191 bytes โ gzip header overhead is ~18 bytes; small payloads hurt
# Ratio : 0.74 (26% savings)
# Encode : 0.041 ms
# Decode : 0.012 ms
#
# Note: compression ratio improves dramatically on larger payloads.
# At 50 KB, gzip typically achieves 75-85% savings on JSON.
import zstandard, json, time
payload = json.dumps({
"user_id": "usr_8f3a2b1c",
"name": "Alice Johnson",
"email": "alice.johnson@example.com",
"created_at": "2024-01-15T10:30:00Z",
"plan": "pro",
"preferences": {
"theme": "dark",
"language": "en",
"notifications": True
},
"tags": ["power-user", "early-adopter", "api-access"]
}).encode()
cctx = zstandard.ZstdCompressor(level=3) # level 3 = "everyday" โ fast + good ratio
dctx = zstandard.ZstdDecompressor()
t0 = time.perf_counter_ns()
compressed = cctx.compress(payload)
encode_ms = (time.perf_counter_ns() - t0) / 1e6
t0 = time.perf_counter_ns()
decompressed = dctx.decompress(compressed)
decode_ms = (time.perf_counter_ns() - t0) / 1e6
print(f"Original : {len(payload):,} bytes")
print(f"zstd lvl3 : {len(compressed):,} bytes")
print(f"Ratio : {len(compressed)/len(payload):.2f} ({100*(1-len(compressed)/len(payload)):.0f}% savings)")
print(f"Encode : {encode_ms:.3f} ms")
print(f"Decode : {decode_ms:.3f} ms")
# Output on typical hardware:
# Original : 258 bytes
# zstd lvl3 : 178 bytes โ better than gzip even at the fast default level
# Ratio : 0.69 (31% savings)
# Encode : 0.018 ms โ ~2ร faster encode than gzip
# Decode : 0.008 ms โ ~1.5ร faster decode than gzip
#
# At level 19 (archive): further 5-10% better ratio, ~20ร slower encode.
# Use level 3 for real-time; level 19 for backups.
import brotli, json, time
payload = json.dumps({
"user_id": "usr_8f3a2b1c",
"name": "Alice Johnson",
"email": "alice.johnson@example.com",
"created_at": "2024-01-15T10:30:00Z",
"plan": "pro",
"preferences": {
"theme": "dark",
"language": "en",
"notifications": True
},
"tags": ["power-user", "early-adopter", "api-access"]
}).encode()
# Quality 4 = fast mode (good for dynamic responses)
# Quality 11 = maximum (only for static assets pre-compressed at deploy time)
t0 = time.perf_counter_ns()
compressed_q4 = brotli.compress(payload, quality=4)
encode_q4_ms = (time.perf_counter_ns() - t0) / 1e6
t0 = time.perf_counter_ns()
compressed_q11 = brotli.compress(payload, quality=11)
encode_q11_ms = (time.perf_counter_ns() - t0) / 1e6
print(f"Original : {len(payload):,} bytes")
print(f"Brotli q=4 : {len(compressed_q4):,} bytes ({encode_q4_ms:.3f} ms encode)")
print(f"Brotli q=11 : {len(compressed_q11):,} bytes ({encode_q11_ms:.3f} ms encode)")
# Output on typical hardware:
# Original : 258 bytes
# Brotli q=4 : 173 bytes (0.022 ms encode) โ similar to zstd at small size
# Brotli q=11 : 155 bytes (1.840 ms encode) โ ~40x slower but best ratio
#
# Brotli q=11 is used for CDN pre-compression:
# compress once at deploy, serve to millions of users.
# NEVER use q=11 for dynamic/real-time API responses โ
# the encode cost (~2ms per response) would dominate request latency.
The code shows the key insight: compression ratio at small payload sizes (258 bytes) looks modest โ 26โ40% savings. But at realistic API payload sizes (5โ50 KB), the same algorithms achieve 70โ85% savings on JSON because there's far more repeated structure to exploit. Always benchmark with your actual payload sizes, not synthetic micro-examples.
Run-Length Encoding โ The Simplest Compression
Before we get to the clever algorithms, let's look at the dumbest one that actually works. Run-Length Encoding (RLE) has exactly one idea: if you see the same byte repeated several times in a row, write the count and the byte once instead of the byte many times. The string AAAAABBB (8 bytes) becomes 5A3B (4 bytes). That's it.
It's almost insultingly simple โ which is exactly why it's a great place to start understanding compression. You can hold the whole algorithm in your head at once, which makes the WHY obvious: it only works when data contains long runs of identical values. Natural text almost never does (when did you last write "aaaaaaa" on purpose?), but bitmap images with large solid-colored regions absolutely do.
Where RLE Wins โ and Where It Spectacularly Fails
RLE crushes data that has long runs of the same value. Classic real-world examples:
- Bitmap images with solid areas โ a logo with a big white background has thousands of identical white pixels in a row. RLE can shrink those dramatically.
- Fax transmissions (CCITT Group 3/4) โ faxed documents are mostly white (blank page) with occasional black lines. Runs of white pixels are very long.
- Classic Windows BMP files โ Windows supported RLE-encoded BMP natively (BI_RLE8 and BI_RLE4 variants) for exactly this reason.
But try RLE on natural English text and it EXPANDS the file. Hello (5 bytes) becomes 1H 1e 2l 1o (8 bytes), because English text almost never has consecutive duplicate characters. RLE sees no runs to exploit, and its overhead makes things worse. This is the fundamental insight: the right algorithm depends entirely on the structure of your data.
# Naive RLE encoder: outputs (count, byte) pairs as a list
def rle_encode(data: bytes) -> list[tuple[int, int]]:
if not data:
return []
result = []
count = 1
current = data[0]
for byte in data[1:]:
if byte == current and count < 255: # cap run at 255 so count fits in 1 byte
count += 1
else:
result.append((count, current))
count = 1
current = byte
result.append((count, current)) # flush the last run
return result
def rle_decode(encoded: list[tuple[int, int]]) -> bytes:
return bytes(byte for count, byte in encoded for _ in range(count))
# Example:
data = b"WWWWWBBBWWW"
enc = rle_encode(data)
print(enc) # [(5, 87), (3, 66), (3, 87)] โ 87='W', 66='B'
print(len(data), "bytes โ", len(enc) * 2, "bytes") # 11 bytes โ 6 bytes
# Length-prefixed RLE: output is actual bytes [count, byte, count, byte, ...]
# count byte is packed so count=1 can be stored as literal (saves a byte for non-runs)
def rle_encode_bytes(data: bytes) -> bytes:
if not data:
return b""
out = bytearray()
i = 0
while i < len(data):
# count the run
j = i + 1
while j < len(data) and data[j] == data[i] and (j - i) < 127:
j += 1
run_len = j - i
if run_len >= 3:
# Encode as run: positive length byte (1โ127) then the value
out.append(run_len)
out.append(data[i])
i = j
else:
# Encode as literals: negative count byte (โ1 to โ128) then raw bytes
# Use 0x80 flag to signal "literal block"
# (simplified: just emit count+byte anyway for clarity)
for k in range(run_len):
out.append(1)
out.append(data[i + k])
i = j
return bytes(out)
# Why length-prefix matters:
# PackBits (used in TIFF, PCX, old Mac formats) uses exactly this scheme.
# The sign bit distinguishes "N literal bytes follow" from "repeat next byte N times".
# This avoids the overhead of emitting count=1 for every non-run byte.
data = b"AAABCDDDDD"
print(rle_encode_bytes(data))
Huffman Coding โ Frequency-Based Codes
RLE only works on runs. What about text where every character appears once, but some characters appear much more often than others? That's where Huffman coding shines.
The idea, invented by David Huffman in 1952 as a student at MIT: instead of giving every character the same 8-bit code, give common characters short codes and rare characters long codes. In English text, 'e' appears ~13% of the time while 'z' appears ~0.07% of the time โ so 'e' might get a 3-bit code and 'z' might get a 12-bit code. The average bits-per-character across the whole file comes out much less than 8.
The Algorithm in Plain English
- Count how often each symbol appears. Scan the whole file once and build a frequency table.
- Put each symbol in its own tiny one-node tree, weighted by frequency.
- Repeatedly merge the two lowest-weight trees into one new tree whose weight is the sum. The two sub-trees become left (bit 0) and right (bit 1) children.
- Stop when you have exactly one tree. The path from root to any leaf gives that symbol's code โ left = 0, right = 1.
The result: a file that needs on average ~1.95 bits per symbol instead of 8. For 100 symbols, that's 195 bits instead of 800 โ a 76% reduction. And you can get back the exact original file from the compressed bits plus the Huffman tree (which is stored in the file header).
The Genius: Prefix-Free Codes
There's a subtle brilliance here that makes Huffman codes actually usable. The decoder reads a stream of bits and needs to know when it's finished reading one code and starting the next. Huffman's tree construction guarantees that no short code is the beginning (prefix) of any longer code โ this property is called being prefix-free.
Because Huffman codes are prefix-free, the decoder can read bits one at a time, follow the tree from root to leaf, emit the symbol when it hits a leaf, then immediately start at the root again โ no look-ahead, no separator bytes needed.
Limitations and Real-World Use
Huffman coding requires knowing the frequencies upfront, which usually means a two-pass scan (one to count, one to compress). Modern variants like adaptive Huffman coding and arithmetic coding avoid the two-pass requirement. In practice, Huffman is never used standalone anymore โ it's the entropy-coding layer INSIDE bigger algorithms: DEFLATE (gzip), JPEG, PNG, and MP3 all use Huffman internally. On English text alone, Huffman achieves roughly a 50-55% size reduction. Combined with a dictionary stage (LZ77), the modern algorithms you use every day are born.
LZ77 โ Dictionary Coding
Huffman coding squeezes common characters into fewer bits, but it can't help with longer repeated phrases. "the quick brown fox" appearing 50 times in a document still costs many bits per appearance, because Huffman codes individual characters, not sequences. That's the gap LZ77 fills.
Abraham Lempel and Jacob Ziv published this idea in 1977. The insight is elegant: the dictionary you use to find repeated sequences is just the data you've already encoded โ the previous bytes of the file itself. No separate vocabulary file to ship. When you see text you've seen before, instead of writing it again, write a "back-reference": "go back N bytes and copy M bytes from there." The decoder re-reads from its own output โ it holds the same bytes the encoder had, so it can reproduce any back-reference exactly.
The Sliding Window โ How the Dictionary Scales
The "dictionary" LZ77 searches is just the N bytes immediately behind the current position โ called the sliding window. When you move forward one byte, the window slides, dropping the oldest byte and gaining the newest. The window size is a fundamental trade-off knob:
- DEFLATE (gzip) โ 32 KB window. Small enough to fit comfortably in 1990s hardware memory.
- Brotli โ up to 16 MB window. Bigger window = more chance to find long matches in large web pages.
- Zstandard (level 22) โ up to 128 MB window. At maximum level, can match text seen megabytes ago.
Bigger window = better ratio but more memory. On a 1990s PC with 4 MB RAM, a 128 MB window was impossible. On a 2024 server with 64 GB RAM, it's trivial โ which is why modern compressors win so decisively over gzip.
Matches also have a minimum length โ typically 3 bytes in DEFLATE. A 2-byte back-reference token itself costs 2 bytes to store, so matching fewer than 3 bytes doesn't save anything. A match of 3 bytes saves 1 byte; a match of 100 bytes saves ~98 bytes.
LZ77 alone achieves roughly a 70-80% reduction on English text. Combined with Huffman entropy coding, you get DEFLATE โ the algorithm that has powered gzip, ZIP, and PNG for 30 years.
DEFLATE & gzip โ The Universal Standard
You've seen LZ77 and Huffman separately. Now meet their offspring: DEFLATE, the algorithm that combines them, and gzip, the file format that wraps DEFLATE with a header and checksum. These two are nearly synonymous in everyday conversation โ "gzip a file" and "compress with DEFLATE" mean the same thing in practice.
Phil Katz designed DEFLATE in 1993 (for PKZIP), and Peter Deutch standardized it as RFC 1951. It's been 30 years, and gzip is still the default HTTP compression on roughly 70% of all web responses, according to the HTTP Archive. That's not because it's the best algorithm anymore โ it's not โ but because it runs everywhere without configuration. Understanding gzip is understanding the web's compression baseline.
The three DEFLATE block types matter because they let the compressor adapt: tiny payloads use fixed tables (no overhead to store a custom table), while large blocks build custom Huffman trees tuned to the exact frequency distribution of that block's content. The compressor chooses dynamically.
Why gzip Has Lasted 30 Years
Universal Support
Every browser since Netscape 4 (1997) speaks gzip. Every CDN, every web framework, every command-line tool. curl, wget, Python's gzip module, Java's GZIPInputStream, .NET's GZipStream โ they all interoperate. The lowest common denominator between any two systems on Earth is almost always gzip. You don't need to negotiate.
Mature Implementations
zlib (1995), the reference implementation of DEFLATE, is one of the most battle-tested libraries on Earth. It has been security-audited repeatedly, handles every edge case, and exists on basically every OS. You will not find a subtle decompression bug in production because zlib has already seen it.
Streaming โ No Buffering
gzip can encode and decode bytes as they arrive. An HTTP server can start sending the first compressed bytes before it has even read the whole response. The decoder on the client side starts decompressing and rendering before the last byte arrives over the wire. This is critical for time-to-first-byte latency.
Patent-Free Since Day One
LZW compression (used in GIF) was patented by Unisys, causing years of licensing headaches. DEFLATE was deliberately designed to avoid those patents. Katz published it as open and free. That legal clarity is a big reason every browser vendor implemented it without hesitation.
Where gzip Shows Its Age
- Slow encode at high levels.
gzip -9is 4-5x slower thangzip -6for a mere 2-3% ratio improvement. The diminishing returns are steep. - Outdated ratio. Brotli and zstd consistently beat gzip-9's ratio while being faster. There's no compression scenario where gzip is the optimal choice anymore โ it wins only on compatibility.
- No dictionary training. gzip can't be trained on your specific data shape. A JSON API that always returns the same keys gets no special benefit; zstd dictionary training would help enormously.
Brotli โ The Web's Modern Default
Imagine you could tell a compressor: "I know you're going to compress a web page. Before you even start, I'll give you a dictionary of 13,000 common web phrases โ <html>, </div>, function, Content-Type: application/json, charset=utf-8 โ and you can match against those as if they were already in your sliding window." That's Brotli.
Google released Brotli in 2013, designed specifically for HTTP compression. Every other compressor we've looked at starts from nothing โ the sliding window is empty at byte zero. Brotli starts with a 13,000-entry static dictionary of common HTML/CSS/JS phrases baked into the algorithm itself. Even a 200-byte JSON response can match against that dictionary. That's why Brotli wins so decisively on small web payloads where gzip can barely compress at all.
Why Brotli Wins on the Web
- 10-15% smaller than gzip on HTML/CSS/JS at comparable encode levels. For a 100 KB CSS bundle, that's 10-15 KB saved on every page load, every user, every day.
- Small payloads compress dramatically better. A 500-byte API response compressed with gzip is barely smaller than the original โ the sliding window has no history. Brotli's static dictionary has 13,000 matching phrases loaded from the start. That 500-byte JSON payload gets real compression.
- Comparable decode speed to gzip. Decompression is always fast โ it's encoding that's expensive. Brotli level 4-6 encodes at a similar speed to gzip level 9 with better output.
- Universal browser support since 2017. Chrome, Firefox, Safari, and Edge all support
Accept-Encoding: br. HTTPS is required (Brotli is only served over secure connections by browser spec).
The One Place Brotli Loses
Brotli level 11 (maximum) is significantly slower to encode than gzip. It's too slow for on-the-fly dynamic compression of large responses. The solution: use level 11 for pre-compressing static assets (CSS, JS, images) at build time, and level 5-6 for dynamic responses in flight. Most CDNs and frameworks do exactly this automatically.
ngx_brotli module lets you serve pre-compressed .br files from disk โ the best approach for static assets.
const { brotliCompress, brotliDecompress, constants } = require('zlib');
const { promisify } = require('util');
const brotliCompressAsync = promisify(brotliCompress);
const brotliDecompressAsync = promisify(brotliDecompress);
async function compressWithBrotli(data, quality = 6) {
// quality 0-11: 0=fastest/worst, 11=slowest/best
// For dynamic HTTP responses: quality 4-6 (fast, good ratio)
// For static asset pre-compression: quality 11 (slow, best ratio)
const compressed = await brotliCompressAsync(data, {
params: {
[constants.BROTLI_PARAM_QUALITY]: quality,
// BROTLI_PARAM_MODE: TEXT (0), GENERIC (2), FONT (3)
[constants.BROTLI_PARAM_MODE]: constants.BROTLI_MODE_TEXT,
}
});
return compressed;
}
async function main() {
const html = Buffer.from('<!DOCTYPE html><html><body><h1>Hello World</h1></body></html>'.repeat(100));
console.log('Original:', html.length, 'bytes');
const br6 = await compressWithBrotli(html, 6);
console.log('Brotli-6:', br6.length, 'bytes', `(${((1 - br6.length/html.length)*100).toFixed(1)}% savings)`);
const br11 = await compressWithBrotli(html, 11);
console.log('Brotli-11:', br11.length, 'bytes', `(${((1 - br11.length/html.length)*100).toFixed(1)}% savings)`);
// Roundtrip check
const decompressed = await brotliDecompressAsync(br6);
console.assert(decompressed.equals(html), 'Decompression mismatch!');
console.log('Roundtrip OK');
}
main();
# nginx.conf โ serve pre-compressed .br files, fallback to gzip
# Requires: ngx_brotli module (compile-time) OR nginx 1.25.1+ with --with-brotli
http {
# โโ Brotli dynamic compression โโโโโโโโโโโโโโโโโโโโโโ
brotli on;
brotli_comp_level 6; # For dynamic: 4-6 is the sweet spot
brotli_types text/html text/css application/javascript
application/json image/svg+xml;
# โโ gzip fallback (browsers without br support) โโโโโ
gzip on;
gzip_comp_level 6;
gzip_types text/html text/css application/javascript application/json;
gzip_vary on; # Adds "Vary: Accept-Encoding" header
server {
listen 443 ssl;
# โโ Static assets: serve pre-compressed files โโโ
# At build time: brotli -q 11 app.js โ app.js.br
# At build time: gzip -9 app.js โ app.js.gz
location /static/ {
brotli_static on; # Serve .br files automatically if client supports it
gzip_static on; # Serve .gz files as fallback
}
# โโ API: dynamic compression โโโโโโโโโโโโโโโโโโโโโ
location /api/ {
proxy_pass http://backend;
# Brotli + gzip already enabled globally above
}
}
}
# Comparing Brotli quality levels 1, 6, 11 on a 1 MB JS bundle
# Run from the command line after installing: brew install brotli
INPUT="bundle.js" # 1,024 KB uncompressed JavaScript
echo "=== Brotli Level Comparison ==="
for LEVEL in 1 4 6 9 11; do
OUTPUT="bundle.br.level${LEVEL}"
TIME_START=$(date +%s%3N)
brotli --quality=${LEVEL} --output=${OUTPUT} ${INPUT}
TIME_END=$(date +%s%3N)
SIZE=$(wc -c < ${OUTPUT})
ELAPSED=$((TIME_END - TIME_START))
RATIO=$(echo "scale=1; (1 - ${SIZE}/1048576) * 100" | bc)
echo "Level ${LEVEL}: ${SIZE} bytes (${RATIO}% smaller) encode: ${ELAPSED}ms"
done
# Expected output (approximate for a typical minified JS file):
# Level 1: 312 KB (69.5% smaller) encode: 8ms โ dynamic responses
# Level 4: 285 KB (72.1% smaller) encode: 18ms
# Level 6: 271 KB (73.5% smaller) encode: 35ms โ good default
# Level 9: 248 KB (75.8% smaller) encode: 210ms
# Level 11: 238 KB (76.8% smaller) encode: 1240ms โ pre-compress static assets only
# Compare to gzip-9: ~258 KB (74.7% smaller), encode ~90ms
# Brotli-11 wins ratio; Brotli-6 wins speed + decent ratio
Zstandard (zstd) โ The Modern Workhorse
If you could design a compressor today, knowing everything about modern hardware, what would you build? Facebook's Yann Collet answered that question in 2016 with Zstandard (zstd). The goal was simple and audacious: beat gzip's compression ratio AND beat gzip's speed, simultaneously, without compromise. Four years of iteration later, he did it.
zstd doesn't win by inventing a new fundamental algorithm โ it's still LZ77 + entropy coding at its core. It wins through engineering excellence: a larger sliding window, a better entropy coder (Finite State Entropy, an ANS variant), 22 granular compression levels, built-in multi-thread support, and the killer feature gzip never had: dictionary training. You can feed zstd a sample of your actual data, and it builds a custom shared dictionary that makes tiny messages compress dramatically better.
The chart tells the story: every zstd level sits above or to the right of gzip on the ratio/speed curve. At level 3 (the default), zstd encodes at roughly 2.5x gzip's speed while matching gzip's ratio. At level 19, it beats gzip-9 ratio by ~8-10% with comparable encode time. There is genuinely no scenario where gzip is the better technical choice versus zstd โ only compatibility requirements keep gzip alive.
zstd's Four Superpowers
22 Compression Levels
Levels 1-22 give you a granular knob from "fastest possible" (level 1, ~500 MB/s encode) to "maximum compression" (level 22, slower but extraordinary ratio). The default is level 3 โ a great middle ground that beats gzip-6 on both speed and ratio simultaneously. Why 22 levels? Because different workloads have radically different latency/CPU budgets, and a coarse 1-9 range isn't precise enough.
Dictionary Training
Train zstd on a sample of your actual data โ say 1,000 small JSON messages from your API. It builds a custom shared dictionary tuned to the patterns in YOUR data. Both sides pre-share this dictionary (a few KB file), and small messages that would compress poorly without context now compress dramatically well. This is the killer feature for microservices sending thousands of small payloads per second.
Streaming + Frame-Based
zstd can stream (start decoding before the full compressed data arrives) AND supports a frame-based format where you can seek to arbitrary frames within a compressed archive without decompressing from the start. This makes it excellent for large databases and columnar storage formats like Parquet, where you need random access into compressed data.
Multi-Threaded Encoding
Modern CPUs have 16-128 cores, but most compressors are single-threaded (gzip certainly is). zstd has built-in parallel encoding: large inputs are split into independent frames, compressed on separate cores simultaneously, then stitched together. On a 32-core server compressing large dataset dumps, zstd can fully utilize all cores โ gzip cannot.
Dictionary Training Visualized
Where zstd is Winning
- Linux kernel โ default compression for kernel modules since Linux 5.9 (2020). Previously gzip; zstd boots faster and produces smaller modules.
- Apache Iceberg / Parquet / ORC โ the best columnar compression for analytics workloads. Replaces Snappy as the default in many configurations.
- RocksDB / MongoDB / Cassandra โ replacing Snappy and gzip for SSTable compression in storage engines.
- Facebook internal RPC โ replaced gzip for essentially all internal service communication, saving significant compute and bandwidth at scale.
- GitHub โ uses zstd for pack file compression in git.
# Install: brew install zstd OR apt install zstd
# Basic usage
zstd file.log # Compress โ file.log.zst (default level 3)
zstd -d file.log.zst # Decompress โ file.log
zstd -19 file.log # Level 19 (high ratio, slow encode)
zstd -1 file.log # Level 1 (fast, moderate ratio)
# Multi-threaded (uses all CPU cores)
zstd --threads=0 big-dataset.json # 0 = auto-detect CPU count
# Streaming (pipe-friendly)
tar --use-compress-program=zstd -cf archive.tar.zst ./data/
tar --use-compress-program=zstd -xf archive.tar.zst
# Compare with gzip on the same file
zstd -3 -k file.log -o file.log.zst3 # -k keeps original
gzip -6 -k file.log
ls -lh file.log file.log.zst3 file.log.gz
# Typical output for a 100MB server log:
# file.log 100 MB (original)
# file.log.zst3 19 MB (81% smaller, encodes in 0.3s)
# file.log.gz 22 MB (78% smaller, encodes in 0.9s)
# Step 1: Collect sample messages (separate files or split a log)
mkdir samples/
# Assuming you have many small JSON files (or split one large file):
split -l 1 messages.ndjson samples/msg_ # one JSON line per file
# Step 2: Train the dictionary from samples
zstd --train samples/* -o my_api.dict
# Output: Completed with 1024 files, dictionary saved to my_api.dict
# my_api.dict is typically 32-112 KB
# Step 3: Compress using the dictionary
zstd -D my_api.dict message.json -o message.json.zst
zstd -D my_api.dict -d message.json.zst # decompress (needs same dict!)
# Step 4: Compare compression with vs without dict
echo "Without dict:"
zstd -3 -k message.json
ls -lh message.json message.json.zst
echo "With dict:"
zstd -D my_api.dict message.json -o message_dict.json.zst
ls -lh message.json message_dict.json.zst
# Typical result for a 500-byte JSON message:
# Without dict: 500 bytes โ 320 bytes (36% savings)
# With dict: 500 bytes โ 80 bytes (84% savings)
# Dict overhead: dict ID is stored in frame header (~4 bytes)
import zstandard as zstd # pip install zstandard
import time, os
# Load a 1 MB JSON file for testing
with open("data.json", "rb") as f:
data = f.read()
original_size = len(data)
print(f"Original: {original_size:,} bytes ({original_size/1024:.1f} KB)")
print(f"{'Level':<6} {'Size':<10} {'Savings':<10} {'Encode ms':<12} {'MB/s':<8}")
print("-" * 50)
for level in [1, 3, 6, 9, 12, 19, 22]:
compressor = zstd.ZstdCompressor(level=level)
# Time encode
start = time.perf_counter()
compressed = compressor.compress(data)
elapsed_ms = (time.perf_counter() - start) * 1000
size = len(compressed)
savings = (1 - size / original_size) * 100
speed_mb = (original_size / 1024 / 1024) / (elapsed_ms / 1000)
print(f"{level:<6} {size:<10,} {savings:<10.1f}% {elapsed_ms:<12.1f} {speed_mb:<8.1f}")
# Verify roundtrip
decompressor = zstd.ZstdDecompressor()
assert decompressor.decompress(compressed) == data, "Roundtrip failed!"
# Expected output (approximate, 1 MB minified JSON):
# Level Size Savings Encode ms MB/s
# 1 310,000 70.0% 1.8 556
# 3 287,000 72.7% 3.2 312
# 6 274,000 73.9% 7.1 141
# 9 265,000 74.8% 18.0 55
# 12 260,000 75.2% 55.0 18
# 19 248,000 76.4% 180.0 5.5
# 22 243,000 76.8% 620.0 1.6
Snappy & LZ4 โ Speed Over Ratio
Most compression tools try to shrink your data as much as possible. But sometimes that's the wrong goal. Imagine you're a database writing 500 MB of hot data to disk every second. If compression takes 10 ms, you've just bottlenecked every write. What you actually want is compression so fast it's nearly invisible โ where the CPU cost is smaller than the saved I/O time. That's the idea behind Snappy and LZ4.
Both were released in 2011 โ Snappy from Google, LZ4 from Yann Collet (who would later build zstd). Both compress at roughly 500 MB/sec, which is close to the speed of a modern memory bus. They don't try for the smallest output; they aim for something that barely slows you down. The trade-off is about 30% worse compression ratio than gzip. On hot, frequently-read data that trade-off almost always wins.
Where Speed-First Compression Wins
These are the three scenarios where "fast enough" beats "smallest possible":
- Network-attached storage โ when a server is streaming data between RAM and a network disk, compression is on the critical path. If the compressor can't keep up with the memory bus (~50 GB/s modern servers), you slow everything down. Snappy and LZ4 stay well below that ceiling.
- Real-time log pipelines โ tools like Logback and Kafka producers compress message batches before sending. At 100k events/second, even a 1 ms compression overhead per batch compounds to seconds of lag. LZ4 compresses a 1 MB log batch in under 2 ms.
- Database storage layers โ RocksDB, Cassandra, and MongoDB all ship with Snappy as the default compressor for active ("hot") data. The reasoning is explicit: hot data is read and written constantly; shaving 30% off ratio is not worth doubling the CPU cost on every read path.
Snappy
Born inside Google as the compressor for Bigtable and Hadoop. The design goal was explicit: "fast enough to not be the bottleneck on a 1 Gbps link." Snappy is symmetric โ encode and decode run at roughly the same speed (~500 MB/s each). The algorithm is simple LZ77-style: find matching substrings in a 64 KB sliding window, emit back-references. No Huffman coding, no fancy entropy coder โ simplicity keeps the branches small and the CPU predictors happy.
At peak, roughly 70% of all Hadoop-stored data globally used Snappy. It's still the default in Google Cloud's BigTable and in older Cassandra/MongoDB deployments.
LZ4
LZ4 is the evolution. It delivers a slightly better compression ratio than Snappy and a noticeably faster decode speed โ decoding can exceed 1 GB/s on modern hardware. This asymmetry matters: in most systems data is read far more often than it's written, so faster decode is worth more than faster encode.
LZ4 also ships in a higher-compression variant called LZ4HC (High Compression). LZ4HC uses an exhaustive search strategy for matches, sacrificing encode speed for a ratio that approaches gzip โ while still decoding at full LZ4 speed. This makes LZ4HC useful for write-once, read-many storage.
Kafka's modern default is LZ4. RocksDB uses LZ4 for its L0 and L1 SST levels (the hottest data).
JPEG โ Lossy Image Compression
Text compressors can't throw anything away โ every byte of code or JSON must survive exactly. Images are different. A human looking at a photograph simply cannot see tiny differences in high-frequency detail โ the flickering of light and shadow at the boundary of a leaf against sky. JPEG (finalized in 1992) was the first widely-deployed format to exploit this. It discards the detail you can't see, and in doing so achieves 10โ20ร compression on photographs with no perceptible quality loss.
The result was transformative. Before JPEG, a single photo over dial-up took minutes. After JPEG, the visual web became possible. Understanding how it works teaches you something deeper: lossy compression is about exploiting the limits of the receiver โ in this case, human vision.
The 5 Steps, Plain English
- Color space transform (RGB โ YCbCr). RGB stores red, green, and blue light levels. JPEG converts to YCbCr: Y is brightness (luminance), Cb is "how blue-ish," Cr is "how red-ish." Why? Because human eyes are ~10ร more sensitive to brightness differences than color differences. By separating them, JPEG can treat each channel differently.
- Chroma subsampling. The Cb and Cr color channels are averaged โ every 4 neighboring pixels share one color sample (this is called 4:2:0 subsampling). The brightness channel keeps full resolution. This step alone halves the color data, and on natural photographs humans almost never notice because our color vision is lower-resolution than our brightness vision.
- DCT โ the frequency transform. Each 8ร8 block of pixels is run through a Discrete Cosine Transform (DCT). DCT converts 64 pixel values into 64 frequency components. The top-left component (DC) is the average color of the block. Components toward the bottom-right represent increasingly fine, high-frequency detail โ sharp edges, fine texture. The image is now in "frequency space" instead of "pixel space."
- Quantization โ where the loss actually happens. Each frequency component is divided by a quality-dependent integer and rounded. High-frequency components get divided by large numbers โ which rounds them to zero. Low-frequency components (the coarse shapes) survive. The "quality slider" in Photoshop or ImageMagick directly controls these divisor values. Smaller divisors = more data survives = larger file = better quality.
- Entropy coding. After quantization, most high-frequency coefficients are zero. JPEG runs a zigzag scan (filling the 8ร8 block diagonally, naturally grouping zeros together), then Run-Length Encodes the runs of zeros, then Huffman codes the whole stream. This is the lossless final step โ it squeezes the already-quantized data without further quality loss.
Real numbers: JPEG quality 85 (the web standard) typically achieves 10โ15ร compression on natural photographs with no perceptible loss. Quality 50 reaches 25โ30ร but introduces visible blocking on edges. JPEG was the dominant image format on the web for nearly 30 years (1995โ2022). It's only now being replaced by WebP and AVIF in most new deployments.
Modern Image Formats โ WebP, AVIF, JPEG XL
Thirty years after JPEG, the web finally has better options. Three formats have emerged to replace it โ each with different engineering trade-offs. None has fully "won" yet, because the web moves at the speed of browser adoption, not algorithm improvement. But if you're building anything image-heavy today, you need to know these.
WebP
When: Google, 2010. Based on: VP8 video codec's intra-frame prediction.
WebP achieves 25-35% smaller files than JPEG at equivalent perceived quality by using better block prediction โ instead of encoding raw pixel differences like JPEG, it predicts what each block looks like based on surrounding blocks and only encodes the error.
It also supports lossless mode (competes with PNG) and animated mode (replaces GIF with far better quality per byte). Browser support reached ~95% of users around 2018 โ nearly universal today. If you can't serve AVIF, WebP is the safe default.
AVIF
When: Alliance for Open Media, 2019. Based on: AV1 video codec's still-image frame.
AVIF achieves ~50% smaller files than JPEG at equivalent quality โ the biggest leap in 30 years. It uses larger flexible block sizes (up to 128ร128 vs JPEG's fixed 8ร8), better inter-block prediction, and a superior entropy coder. No visible block artifacts even at aggressive compression.
The downside: encode speed is slow โ encoding a large image can take seconds. Most CDNs pre-encode and cache. Browser support hit ~90% in 2022. It's the new baseline for modern image-heavy sites: Shopify, Etsy, and most major CDNs (Cloudflare Images, AWS CloudFront) now serve AVIF by default.
JPEG XL
When: JPEG Committee, 2021. Designed as: the official JPEG successor from the original standards body.
JPEG XL has two killer features. First: lossless JPEG transcoding โ it can re-encode existing JPEG files without any quality loss, just smaller. Second: decode speed โ JPEG XL decodes faster than AVIF, making it better for large, high-resolution images where decode time matters (print, medical imaging).
Browser support is patchy: Chrome dropped it in 2023 (citing "insufficient interest"), Safari supports it, Firefox is working on it. For web delivery, AVIF is currently the safer bet. JPEG XL may still win for archival and professional workflows.
The Pragmatic Multi-Format Serve Pattern
Because no single format has 100% browser support, the correct approach is to offer multiple formats and let the browser pick. HTML's <picture> element was built for exactly this:
<!-- Browser tries formats in order โ uses first one it supports -->
<picture>
<!-- Best: AVIF if browser supports it (~90% of users, 2024) -->
<source type="image/avif" srcset="photo.avif">
<!-- Fallback: WebP (~95% of users) -->
<source type="image/webp" srcset="photo.webp">
<!-- Final fallback: JPEG (universal) -->
<img src="photo.jpg" alt="Product photo" width="800" height="600">
</picture>
<!-- Result: AVIF users get 50% smaller file.
WebP users get 30% smaller. JPEG users get baseline.
Zero JS required โ pure HTML negotiation. -->
# Cloudflare / Fastly / CloudFront all do this automatically.
# If you self-host via nginx + libvips or imageproxy:
location ~* \.(jpg|jpeg|png)$ {
# Check if browser accepts AVIF
if ($http_accept ~* "image/avif") {
rewrite ^(.+)\.(jpg|jpeg|png)$ $1.avif break;
}
# Otherwise try WebP
if ($http_accept ~* "image/webp") {
rewrite ^(.+)\.(jpg|jpeg|png)$ $1.webp break;
}
# Fallback to original
}
# Pre-generate AVIF and WebP variants at deploy time:
# convert input.jpg -quality 80 output.avif
# cwebp -q 80 input.jpg -o output.webp
Real numbers: Switching from JPEG to AVIF typically saves 40-55% on image bandwidth. At the scale of a site with 10M daily image views averaging 200 KB per image, that's 400-550 TB/month saved. Cloudflare Images and AWS CloudFront both perform automatic format conversion โ you store one JPEG; they serve the best format for each browser.
Video Compression Basics
Video is, by far, the dominant consumer of internet bandwidth โ roughly 80% of all traffic is video. The scale is almost incomprehensible: without compression, a single 1080p hour of uncompressed video at 24fps would be about 750 GB. With modern compression (H.264), that same hour is 2-4 GB. That's a 200ร reduction. At Netflix's scale, this difference translates to billions of dollars per year in bandwidth costs.
Video compression builds on image compression (each frame is basically a JPEG) but adds a second axis of redundancy that's unique to video: time. Most video frames are nearly identical to the frame before. A talking head video has a static background. A panning shot moves the whole frame but changes nothing about the objects in it. Exploiting temporal redundancy is where the real savings come from.
The 3 Big Tricks
Inter-frame Prediction
Instead of encoding every frame fully, encode only the difference between frames. If 95% of a frame is identical to the previous one, you only need to encode the 5% that changed. This is called a P-frame (predicted frame). The video decoder reconstructs the full frame by combining the previous frame with the encoded difference.
Why does this work so well? Video has very high temporal coherence โ adjacent frames at 30fps are only 33ms apart. Very little changes in 33ms in most real-world video.
Motion Vectors
When an object moves across the screen, a naive approach would encode it as "changed pixels." But a smarter approach says: "this 16ร16 block at position (100, 200) is the same as the block at (108, 195) in the previous frame โ it just shifted." That shift is a motion vector: two numbers instead of 256 pixel values.
Motion estimation is the most computationally expensive step in video encoding โ the encoder searches the previous frame for the best matching block. Decoders just apply the pre-computed vectors, which is why encoding is much slower than decoding.
Block DCT on Residuals
After motion compensation, there's still a small difference โ the residual. This is encoded using the same DCT + quantization approach as JPEG, applied to each block of the residual frame. Even the residual has spatial redundancy that DCT can exploit.
This three-stage pipeline (motion prediction โ residual computation โ DCT coding) is the core of every major video codec from H.264 to AV1. The codecs differ in their specific algorithms, block sizes, and entropy coders โ but the structure is universal.
The 4 Dominant Video Codecs
H.264 / AVC (2003)
The workhorse. Nearly 95% of devices can decode H.264 in hardware โ phones, TVs, browsers, cameras. It's the lingua franca of video. ~2ร better than the older MPEG-2 standard. Royalty-bearing (MPEG-LA patent pool), though streaming doesn't require per-viewer royalties for most use cases.
Still the right choice when maximum compatibility matters โ live streaming, video conferencing, broadcast. Most hardware encoders (GPU NVENC, Apple VideoToolbox) are still optimized for H.264.
H.265 / HEVC (2013)
~50% better compression than H.264 at equivalent quality. Uses larger blocks (up to 64ร64 vs H.264's 16ร16) and better motion prediction. Should have been the obvious successor. Wasn't, because of a fractured patent landscape โ two competing patent pools made licensing complex and expensive.
Adoption stayed modest. Apple devices support it well (HDR/4K Blu-ray uses HEVC), but web adoption is patchy. AV1 has largely superseded it for new open-web deployments.
AV1 (2018)
The Alliance for Open Media (Google, Netflix, Amazon, Apple, Microsoft, Mozilla) built AV1 specifically to be royalty-free and to beat HEVC on ratio. It achieves ~30% better compression than HEVC (roughly 2ร better than H.264). Hardware decode support arrived in 2020-2022 (most new phones, Apple Silicon). Browser support is near-universal.
The catch: AV1 encode is slow โ 10-50ร slower than H.264 at the same quality. Netflix and YouTube pre-encode everything (hours per video), so this is acceptable. Real-time encoding remains a challenge, though hardware encoders are improving.
VP9 (2013)
Google's pre-AV1 open codec. Roughly equivalent to HEVC in quality without the licensing headache. YouTube has served VP9 as its primary codec since 2014 โ trillions of video views. Chrome, Firefox, and Edge all support VP9 hardware decode.
VP9 is now the fallback for YouTube when a device doesn't support AV1. It's in a maintenance mode โ no new major features, but it's stable, widely supported, and entirely royalty-free.
Real numbers: Netflix encodes every title in 5+ codecs (H.264, H.265, VP9, AV1) and 20+ bitrate/resolution combinations โ up to 120 variants per title. AV1 saves ~30% bandwidth vs H.264 across their catalog. At Netflix's ~15% of global internet traffic, that ~30% saving is worth billions of dollars per year in egress fees.
Compression in Storage Systems
Every production storage system you'll encounter offers compression. But the knobs, defaults, and trade-offs are all different โ and enabling compression on the wrong workload can spike CPU from 30% to 80% and double your tail latency. This section gives you the practical map: what each system does, which compressor it uses by default, and what levers you have.
The core trade-off is always the same: CPU time in exchange for I/O time. Compression wins when your bottleneck is disk or network bandwidth. It loses when your bottleneck is CPU. Most modern cloud workloads are I/O-bound, so compression usually pays off โ but always measure with your actual workload before enabling it on a live system.
| System | Default | Options | Granularity | Notes |
|---|---|---|---|---|
| PostgreSQL | pglz | pglz, LZ4 (v14+) | Per-column (TOAST) | Only compresses columns >2 KB via TOAST. LZ4 added in Postgres 14 โ 3-5ร faster than pglz with similar ratio. |
| MySQL / InnoDB | None | zlib (ROW_FORMAT=COMPRESSED) | Per-table | Page-level compression. Reduces storage but adds CPU per read/write. Trade-off: 30-60% smaller, 10-30% more CPU. |
| MongoDB / WiredTiger | Snappy | Snappy, zlib, zstd, none | Per-collection | Snappy default is fast and keeps CPU overhead minimal. Switch to zstd for read-heavy cold collections. |
| Cassandra | LZ4 | LZ4, zstd, deflate, none | Per-table | Discord saved petabytes switching from LZ4 to zstd-9. zstd-9 is slower to write but much smaller storage. |
| Kafka | None (opt-in) | gzip, snappy, lz4, zstd | Per-producer or per-topic | Compresses BATCHES not individual messages. zstd is the modern recommendation. 4-8ร throughput gain on typical payloads. |
| Parquet / Iceberg | Snappy | Snappy, gzip, zstd, brotli, none | Per-column | Columnar layout + compression is doubly effective โ repeated values within a column compress better than row-mixed data. |
| S3 | None | App-layer before upload | Object-level | S3 itself doesn't compress. You compress before uploading (gzip, zstd) and set Content-Encoding. Intelligent-Tiering reduces cost via storage class, not compression. |
| RocksDB | LZ4 (hot) / zstd (cold) | Per-LSM-level configurable | Per-level | Different compressors per LSM level is the killer feature โ see SVG below. |
Real numbers: Discord's famous switch from Cassandra LZ4 to zstd-9 saved multiple petabytes of storage. Most systems see 40-70% storage reduction with modern compressors on typical application data. Column-oriented formats like Parquet often see 80-90% reduction because identical values in a column compress near-perfectly.
Compression in Transport
Compression isn't only about what you store โ it's also about what you send over the wire. HTTP, gRPC, Kafka, and S3 all support compression in some form. The good news: it's usually just one header or one config option. The operational details, however, matter a lot โ especially when your traffic passes through CDNs, proxies, and multiple application layers.
HTTP Compression
The browser sends Accept-Encoding: br, gzip, deflate, zstd with every request. The server picks a format it supports, compresses the response body, and sets Content-Encoding: br (or gzip/zstd). The browser decompresses transparently.
Content negotiation is per-response โ different responses can use different encodings. Most CDNs (Cloudflare, Fastly, CloudFront) handle this automatically. Brotli at the CDN edge is the modern standard โ it typically cuts HTML/CSS/JS by 15-25% over gzip.
gRPC Compression
gRPC supports per-RPC compression via the grpc-encoding header. Supported algorithms: gzip, deflate, snappy (with custom codec), zstd. Compression is disabled by default โ you must explicitly opt in.
gRPC already uses Protobuf (a binary format that's already compact), so compression gains are smaller than on JSON. For large binary payloads (images, documents) in a gRPC stream, zstd compression can still save 30-50%. For small protobuf messages, skip compression โ the overhead isn't worth it.
Kafka Producer Compression
Kafka compresses at the batch level, not the message level. A producer accumulates a batch of messages, compresses the whole batch, and sends one compressed record batch. The broker stores it compressed; consumers decompress. zstd is the modern recommended setting.
Why batch-level? Because each individual message may be too small to compress effectively (the overhead of a compression header exceeds the savings). A batch of 1000 similar log messages compresses 10-20ร better than any single message would. This is a key design insight: compression works best when applied to large, similar data chunks.
TLS / NEVER compress here
In the early 2000s, TLS actually supported stream-level compression (DEFLATE). In 2012-2013, security researchers discovered this was catastrophically insecure. The CRIME (2012) and BREACH (2013) attacks let an attacker inject known plaintext into a TLS stream and observe how the compressed size changed. Size differences leaked information about encrypted secrets (like session tokens in cookies).
TLS 1.3 (2018) explicitly prohibits compression. It's gone. If you ever encounter a configuration suggesting TLS compression, it's wrong and dangerous. Always compress at the application layer, above TLS, where you control what gets compressed and when.
Code: Three Transport Configurations
http {
# Enable gzip (broad compatibility)
gzip on;
gzip_vary on; # Vary: Accept-Encoding header for CDN caching
gzip_proxied any; # Compress even for proxied requests
gzip_comp_level 6; # Sweet spot: good ratio, reasonable CPU
gzip_min_length 1000; # Don't compress tiny responses (overhead not worth it)
gzip_types
text/plain text/css text/javascript
application/javascript application/json
application/xml image/svg+xml;
# Brotli (better than gzip, ~95% browser support)
# Requires ngx_brotli module: https://github.com/google/ngx_brotli
brotli on;
brotli_comp_level 6; # 1-11; level 6 = good ratio at reasonable speed
brotli_types
text/plain text/css application/javascript
application/json image/svg+xml;
# zstd (newest โ requires ngx_zstd module)
# zstd on;
# zstd_level 3;
}
Key detail: gzip_vary on is mandatory when behind a CDN โ it tells the CDN "the response varies by Accept-Encoding" so it caches compressed and uncompressed versions separately. Without it, a CDN might serve a gzip-compressed response to a client that didn't ask for it.
import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.Properties;
public class CompressedProducer {
public static KafkaProducer<String, String> create() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
// zstd is the modern recommendation (Kafka 2.1+)
// Options: "none", "gzip", "snappy", "lz4", "zstd"
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "zstd");
// Larger batches = more redundancy = better compression
// Default: 16384 (16 KB). Increase for compression workloads.
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 131072); // 128 KB
// Wait up to 10ms to fill a batch before sending
// Gives more messages to compress together
props.put(ProducerConfig.LINGER_MS_CONFIG, 10);
return new KafkaProducer<>(props);
}
// WHY batch-level compression wins:
// 1000 log messages compressed together: ~20:1 ratio (logs are repetitive JSON)
// Each message compressed individually: ~3:1 ratio
// The batch context provides the dictionary that makes compression shine.
}
Real numbers: On typical application JSON logs, Kafka with zstd compression achieves 4-8ร throughput improvement over no compression. Not because CPU is fast โ but because smaller batches mean fewer network round-trips and less broker disk I/O. The throughput gain often exceeds what the CPU saves.
package main
import (
"context"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
"google.golang.org/grpc/encoding/gzip" // registers gzip codec
)
func main() {
// Method 1: default compression for ALL RPCs on this channel
conn, err := grpc.Dial(
"server:50051",
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultCallOptions(grpc.UseCompressor(gzip.Name)),
)
if err != nil {
panic(err)
}
defer conn.Close()
client := pb.NewMyServiceClient(conn)
// Method 2: per-RPC compression (override per call)
resp, err := client.GetLargeData(
context.Background(),
&pb.Request{Id: 123},
grpc.UseCompressor(gzip.Name), // only THIS call is compressed
)
// WHY gzip and not something faster?
// gRPC's built-in compressor is gzip-only by default.
// For zstd or snappy, implement grpc/encoding.Compressor interface:
//
// type Compressor interface {
// Compress(w io.Writer) (io.WriteCloser, error)
// Decompress(r io.Reader) (io.Reader, error)
// Name() string
// }
//
// Register with: encoding.RegisterCompressor(myCompressor)
}
When to use gRPC compression: Only when sending large payloads (images, documents, binary blobs) or large lists of protobuf messages. For typical small RPC calls (<10 KB), compression overhead (CPU + header size) exceeds the bandwidth saving. Profile first โ don't enable blindly.
Real numbers: Cloudflare reports brotli at the edge cuts global egress 35-55% for text assets (HTML, CSS, JS) compared to no compression. Kafka with zstd versus no compression typically delivers 4-8ร throughput improvement on realistic application payloads. For a service processing 1 GB/s of Kafka messages, that's the difference between 8 brokers and 1.
Diagnostic & Benchmarking Tools
- Six real-world command-line tools for measuring and comparing compressors
- How to run lzbench to get apples-to-apples numbers across 30+ algorithms on your own data
- How to train a custom zstd dictionary and test HTTP compression configs live
Reading benchmarks online is helpful, but the only numbers that matter are the ones you get from your actual data. Log lines are very different from JSON API responses, which are very different from sensor readings. The tools below let you measure what happens when your specific bytes go through a compressor โ not someone else's test file.
gzip -l file.gzInspect any existing gzip file without decompressing it. It prints the original size, compressed size, and ratio. The -l flag stands for "list" โ just metadata, nothing is expanded. Use this to audit compressed log archives before you decide whether to re-compress them with a newer algorithm. WHY it's useful: you don't need to decompress a 4 GB archive to find out it's only 2:1 when you expected 5:1.
zstd --trainBuilds a custom dictionary from a folder of sample files: zstd --train data/*.json -o dict.zst. The training pass scans for common byte sequences shared across your samples and writes a 100โ112 KB digest. Subsequent compressions that reference this dictionary typically achieve 3โ10ร better ratios on small messages (under 1 KB). WHY it matters: every message compressed in isolation lacks context; the dictionary supplies that missing shared context.
brotli --quality / --large_windowControl brotli level with --quality 0 (fastest, worst ratio) through --quality 11 (slowest, best ratio). Add --large_window to allow a 16 MB back-reference window instead of the default 16 KB โ crucial for compressing large text files where repeated patterns appear far apart. WHY this flag exists: the RFC default window is small for compatibility; large-window brotli requires both sides to negotiate it, but for offline compression of static assets, it's safe and gives 5โ15% better ratios.
lzbenchlzbench is an open-source benchmark that runs 30+ compression libraries (gzip, zstd, lz4, brotli, snappy, lzma, and more) against the same input file in a single pass. Output is a CSV table with ratio, encode speed, and decode speed. Clone it, point it at a sample of your production data, and compare. WHY it's the right tool: vendor benchmarks use synthetic corpora (Silesia, Canterbury); lzbench uses your data, which is the only benchmark that predicts your production behaviour.
hyperfine 'gzip < input'hyperfine is a command-line benchmarking tool that warms up the disk cache, runs the command N times, and gives you a mean wall-clock time with standard deviation. Compare two compressors with: hyperfine 'gzip -6 < data.log' 'zstd -3 < data.log'. WHY this matters over a manual time call: a single timing run is noisy โ cache misses, scheduler jitter, and memory pressure all affect the result. hyperfine's statistical summary tells you whether a 10% speed difference is signal or noise.
pv | gzip | wc -cA Unix pipe micro-benchmark: pv input.log | gzip -6 | wc -c. pv (pipe viewer) shows throughput in real time (e.g., 430 MB/s), gzip compresses the stream, and wc -c counts the output bytes. Swap gzip for zstd or lz4 to compare live throughput. WHY pipe benchmarks work: they measure steady-state streaming throughput with realistic I/O, which is close to what a log pipeline or network stream will experience.
Run lzbench against a 50 MB sample of your production data. The -ebrotli,gzip,zstd,lz4,snappy flag limits which codecs run so results come back in seconds instead of minutes.
# clone and build (one-time)
git clone https://github.com/inikep/lzbench.git
cd lzbench && make
# benchmark against your data sample
./lzbench -ebrotli/6,gzip/6,zstd/3,lz4,snappy ./data-sample.json
# example output columns:
# Compressor Ratio Encode MB/s Decode MB/s
# brotli 1.0.9 -6 3.21 22 342
# gzip 1.10 -6 2.89 61 310
# zstd 1.5 -3 3.04 435 1200
# lz4 1.9 2.11 650 4100
# snappy 1.1 1.97 460 2000
#
# zstd wins on ratio+speed for text JSON.
# lz4/snappy win on speed if ratio is secondary (log pipelines).
# brotli wins on ratio if encode speed is not a constraint.
Training a zstd dictionary is a one-time offline step. You feed it 100โ10,000 representative samples, and it writes a binary dictionary file you deploy alongside your application.
# Step 1: gather representative samples (min 100 recommended)
mkdir samples
cp prod-api-payloads/2024-01-*.json samples/
# Step 2: train โ output is a 112 KB binary file
zstd --train samples/*.json -o api-dict.zst
# Step 3: compress with dictionary
zstd -D api-dict.zst -3 payload.json -o payload.zst
# Step 4: decompress (must supply the same dict on the other end)
zstd -d -D api-dict.zst payload.zst -o payload_out.json
# Ratio comparison on a 400-byte JSON message:
# Without dict: 400 โ 310 bytes (1.29:1)
# With dict: 400 โ 42 bytes (9.5:1)
# WHY: the dict pre-loads field names like "timestamp", "userId",
# "eventType" that appear in every message โ the compressor
# back-references them instead of re-encoding them each time.
Once you've updated your nginx config to enable brotli and gzip, use curl --compressed to verify the server actually sends compressed responses with the right headers.
# nginx config snippet (requires ngx_brotli module)
gzip on;
gzip_types text/plain text/css application/json application/javascript;
gzip_min_length 256;
gzip_comp_level 6;
brotli on;
brotli_types text/plain text/css application/json application/javascript;
brotli_comp_level 6;
# Test brotli negotiation โ curl sends Accept-Encoding: br
curl -H "Accept-Encoding: br" -I https://yoursite.com/api/data
# Expected response headers:
# Content-Encoding: br
# Vary: Accept-Encoding
# Test gzip fallback
curl -H "Accept-Encoding: gzip" -I https://yoursite.com/api/data
# Expected:
# Content-Encoding: gzip
# Measure actual compressed body size
curl -H "Accept-Encoding: br" -s https://yoursite.com/api/data | wc -c
# Compare to uncompressed:
curl -H "Accept-Encoding: identity" -s https://yoursite.com/api/data | wc -c
Common Misconceptions
Compression is one of those topics where a little knowledge leads to confidently wrong decisions. These six misconceptions show up constantly in code reviews, architecture discussions, and incident post-mortems. Each one sounds reasonable until you look at the actual numbers.
Misconception 1: "Higher compression level always means smaller output."
This is mostly true, but the returns diminish sharply past a certain level. With gzip, the difference between level 6 (the default) and level 9 (maximum) is roughly 2โ4% smaller output โ but encoding takes 3โ5ร longer. For a 100 KB JSON API response: gzip-6 might yield 18 KB; gzip-9 might yield 17.4 KB. You saved 600 bytes at the cost of 5ร CPU per request. That trade-off is almost never worth it for runtime HTTP compression.
Where higher levels DO pay off: offline archival. If you're compressing a 500 GB cold-storage backup that will be read once in two years, spending 10ร more CPU on encoding is fine because you only pay that cost once and save storage forever. The rule: use higher levels for data that is written once and read rarely; use default levels for anything in a hot path.
Misconception 2: "Compression always saves bandwidth."
This is flat wrong for already-compressed data. JPEG, PNG, MP4, WebP, AVIF, and encrypted blobs are already entropy-optimized โ their bytes look like random noise to a general-purpose compressor. Running gzip on a JPEG doesn't shrink it; it makes it slightly larger (typically 1โ5% bigger) because gzip adds a header and framing overhead on top of data it couldn't compress.
This is why web servers should configure gzip_types and brotli_types to list only compressible content types: text/html, text/css, application/json, application/javascript. Exclude image/*, video/*, and application/octet-stream. A common mistake is enabling compression globally and watching a CDN bill go UP because every image response now has 400 bytes of gzip header attached for zero gain.
Misconception 3: "I should compress HTTP responses at the TLS layer for extra security."
This is a serious security mistake. The CRIME (2012) and BREACH (2013) attacks demonstrated that combining TLS-layer compression with attacker-controlled input allows an attacker to recover secret cookies and CSRF tokens byte-by-byte, just by watching how response sizes change.
The attack works because compressed size leaks information about content: if the attacker can inject a guess into the request and the server reflects it in the response, and the response includes a secret, then a correct guess will compress better than a wrong guess โ the attacker sees a slightly smaller response. TLS encryption hides the content, but not the length. The lesson: compress at the application layer, never at the TLS record layer. All major TLS libraries disabled TLS-level compression after CRIME/BREACH. If you're using a custom TLS stack, make sure the COMPRESS extension is disabled.
Misconception 4: "Brotli is always better than gzip."
On text content (HTML, CSS, JSON, JavaScript), brotli at quality 6 typically achieves 10โ20% better compression ratios than gzip-6. That's real and worth it for static assets served over a CDN. But the advantage shrinks dramatically for binary or already-compressed content: on PDFs or images with embedded text, brotli often beats gzip by less than 2% โ not worth the added complexity.
More importantly, brotli encode speed at high quality levels is much slower than gzip. Brotli-11 is roughly 100ร slower to encode than gzip-6 โ completely impractical for runtime compression. Only quality levels 4โ6 are fast enough for on-the-fly HTTP responses; levels 7โ11 should only be used for pre-compressed static assets. The bottom line: brotli wins on text with pre-compression; gzip wins on simplicity, speed, and universal support.
Misconception 5: "I can just enable compression and see what happens in production."
This seems harmless. It is not. Enabling gzip on a busy application server that previously served uncompressed responses can double CPU utilization under peak load โ because now every response must be encoded before sending. If your servers are running at 50% CPU headroom, you've just eliminated that buffer and are headed for an incident.
The right approach: enable compression in staging, run a realistic load test (1ร peak traffic), and measure CPU, p99 latency, and throughput. The most common outcome: CPU goes up 15โ30% on the app server, bandwidth drops 60โ70%, and downstream CDN costs drop enough to offset the extra compute. But you need to see those numbers on your stack before flipping a production switch. Also consider offloading compression to a reverse proxy (nginx, Envoy) so app server CPU is freed for application logic.
Misconception 6: "Compression ratio is the only metric that matters."
In a benchmark, ratio looks like the headline number. In production, encode and decode speed dominate most workloads. Consider: a log ingestion pipeline processing 1 GB/s of data. A compressor that achieves 4:1 ratio but only 200 MB/s encode speed is a bottleneck โ the pipeline has to run 5 parallel compression threads just to keep up. A compressor with 3.5:1 ratio but 1 GB/s encode speed processes the stream in a single thread with headroom to spare.
The trade-off depends on your bottleneck. If your system is bandwidth-bound (network saturated, S3 egress costs high), ratio matters most. If your system is CPU-bound (dense compute, Lambda with 512 MB RAM), speed matters most. If your system has fast local storage but slow network, ratio wins; if you're doing in-memory compression for a cache, decode speed dominates. The professional answer: always benchmark the metric that maps to your actual bottleneck.
Real-World Disasters & Lessons
Five incidents where compression went wrong in ways engineers didn't see coming. Each one changed how the industry thinks about a specific aspect of compression safety.
Thai Duong and Juliano Rizzo published CRIME (Compression Ratio Info-leak Made Easy) at ekoparty 2012. The attack targeted TLS's optional COMPRESS extension: when TLS was configured to compress the record payload before encrypting it, an attacker who could inject data into HTTPS requests could observe the change in ciphertext length and recover session cookies byte-by-byte.
The attack was confirmed against TLS 1.0 and TLS 1.1 implementations in major browsers. Within weeks, Chrome, Firefox, and OpenSSL disabled TLS-level compression entirely. The lesson the industry learned: compression and encryption are fundamentally incompatible as neighbors โ encryption hides content, but not length, and compression leaks content through length.
BREACH (Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext) generalized CRIME. Instead of attacking TLS-layer compression, it attacked application-layer compression โ gzip on the HTTP response body. The attack still works even when TLS is in use, because the attacker observes the size of the encrypted payload (which reveals the compressed size).
Crucially, BREACH works whenever: (1) the HTTP response is compressed, (2) the response includes a secret, and (3) the response reflects part of the request (e.g., a search term echoed back). Practical mitigations: randomize CSRF tokens per request (so they never appear verbatim in the response), avoid reflecting user input in the same compressed response as secrets, and consider disabling compression for authenticated endpoints that return sensitive data. BREACH is still largely unmitigated at the protocol level โ it's a design problem, not an implementation bug.
A major social platform (details consistent with Twitter's 2014 log infrastructure incident) encountered a crafted compressed log file that decompressed to over 100 GB from a ~1 MB compressed source. The log shipper opened the file, began decompressing it into memory, and OOM-killed the process โ taking out the entire log pipeline for that host.
The concept is called a zip bomb: nested or cleverly structured compressed archives where each decompression layer expands the data further. Classic examples compress a single repeated byte to near-theoretical limits (1 byte ร 1 billion = 1 GB from a few KB). The lesson: always set decompression limits. zstd has a --memory flag and a max decompressed size field in the frame header. gzip has no built-in limit โ you must enforce it externally (e.g., gunzip | head -c 100M to cap output). Never trust a compressed file to be "reasonable" in size.
Slack's mobile infrastructure team discovered that for API responses under approximately 200 bytes, enabling gzip compression was producing larger responses, not smaller ones. The gzip format has a fixed overhead of roughly 20 bytes for the stream header plus framing, and Deflate's internal block encoding adds additional overhead on tiny inputs. For a 50-byte JSON response, gzip produced 70โ90 bytes โ a 40โ80% size increase.
The fix: Slack disabled compression for responses below 200 bytes in their API gateway, using a response-size check before encoding. This is now considered standard practice โ nginx's gzip_min_length directive exists precisely for this reason. The default threshold in most guides (256 or 1024 bytes) comes from measurements like this one.
During the Log4j vulnerability response period, security researchers noted a class of related vulnerabilities: log processors that applied compression (or decompression) to log messages before storing or forwarding them. An attacker who could inject crafted log entries could submit small strings that, after decompression, expanded into multi-gigabyte allocations โ exhausting the log processor's heap and triggering OOM or extreme GC pauses.
The fundamental lesson: every data transformation pipeline must bound its output size. This applies at each stage independently. If you decompress, then parse, then re-compress, set a maximum output size at the decompression step, at the parsing step, and at the encoding step. The compression step is often the most dangerous because its expansion ratio can be 1000:1 or higher for crafted inputs. Validate input size before decompressing, and enforce a hard output cap during decompression (stop and error if exceeded).
Performance & Best Practices Recap
Everything from the page distilled into eight actionable rules. These are the decisions you should make by default, with the reasoning that justifies each one โ so you can override them intelligently when your situation differs.
Every browser and HTTP client built in the last 15 years supports gzip. Setting Content-Encoding: gzip requires zero client-side configuration. WHY it's the right default: universal support means no negotiation failures, gzip level 6 achieves 60โ70% size reduction on typical JSON/HTML, and nginx and Apache handle it with a single config line at negligible CPU overhead for typical traffic volumes.
Set up your server (or CDN) to serve Content-Encoding: br when Accept-Encoding includes br, with gzip as the fallback. WHY brotli is worth adding: on typical web page assets (HTML + CSS + JS), brotli-6 achieves 15โ20% better ratios than gzip-6 with similar CPU cost. The CDN handles negotiation automatically โ you pre-compress both and let the CDN pick.
zstd at level 3 is the best modern general-purpose compressor: 3โ4:1 ratio on text, 430+ MB/s encode, 1200+ MB/s decode. WHY it beats gzip internally: you control both encoder and decoder so universal support isn't required; zstd's faster decode speed means reading compressed data from S3 or a database is not a bottleneck; and zstd's dictionary training feature is a first-class API, not an afterthought.
For any system sending many messages under 1 KB (API payloads, events, metrics), run zstd --train on a representative sample. WHY: a single message has too few bytes for the compressor to discover patterns; a pre-trained dictionary supplies those patterns up front, typically improving ratio from 1.3:1 to 5โ10:1 on 200โ500 byte JSON messages. Deploy the dictionary as a static binary file alongside your service.
Add a build step that generates app.js.br (brotli-11) and app.js.gz (gzip-9) alongside every static file. Configure nginx to serve the pre-compressed file directly with gzip_static on and brotli_static on. WHY: brotli-11 is too slow for runtime (100ร slower than gzip-6) but is perfectly fine offline โ your CI runs it once; users get maximum ratio; your servers spend zero CPU encoding responses.
gzip adds ~20 bytes of header overhead before a single byte of actual data. On a 50-byte health check response, you'd produce a 70-byte response โ 40% larger. WHY the threshold is ~200 bytes: below that point, typical gzip compression savings are smaller than the header overhead. Set gzip_min_length 256 in nginx to enforce this automatically.
TLS-layer compression was disabled across the industry after the CRIME/BREACH attacks. Do not re-enable it. Similarly, compressing AES-encrypted blobs, bcrypt hashes, or any other pseudo-random data wastes CPU for zero size reduction. WHY: entropy is the enemy of compression โ random bytes have no patterns to exploit, and gzip headers add overhead on top of that. Check your TLS library config and confirm SSL_OP_NO_COMPRESSION or equivalent is set.
Every codepath that decompresses data must have a hard limit on output size. In Go: io.LimitReader(r, maxBytes). In Java: check the uncompressed size field before decompressing. In Python: wrap the decompressor with a byte counter and raise if exceeded. WHY: a zip bomb compresses to a few KB but expands to gigabytes, crashing log shippers, parsers, and cache loaders. A 1000:1 compression ratio is achievable with a single repeated byte โ never trust compressed input to be "reasonable."
FAQ
The questions that come up every time someone first works with compression in a production system. Answers are concrete and skip the "it depends" whenever a clear default exists.
Should I use gzip or brotli on my website?
Both โ and you don't have to choose. Most CDNs (Cloudflare, Fastly, CloudFront) negotiate automatically: they inspect the Accept-Encoding header the browser sends, serve brotli to browsers that support it (Chrome, Firefox, Edge, Safari โ basically every modern browser), and fall back to gzip for anything else.
If you're self-hosting, generate both file.br and file.gz at build time and configure nginx with brotli_static on; gzip_static on;. On typical HTML + CSS + JS, brotli achieves about 10โ15% better compression than gzip. That's real but not earth-shattering โ the more impactful optimization is usually code splitting and tree shaking before compression, but every percent counts at scale.
Why did Chrome remove JPEG XL support?
Chrome shipped a behind-a-flag implementation of JPEG XL in 2021 and removed it in late 2022 (Chrome 110). The Chrome team's stated reason was that browser adoption of JPEG XL was slow and the ecosystem wasn't converging on it โ WebP and AVIF were already widely deployed with stronger hardware decode acceleration on mobile chipsets.
Notably, Safari added JPEG XL support in Safari 17 (2023), and Firefox added it in Firefox 128 (2024). So JPEG XL is still alive and has partial browser support โ just not Chrome, which is the dominant desktop browser. For production image delivery, AVIF is the current best choice for both ratio and browser support; JPEG XL remains a strong option for workflows where you control the viewer (apps, desktop software, print) because it supports lossless, lossy, and HDR in a single format with superior perceptual quality at equivalent file sizes.
What's the right zstd compression level?
Three cases cover 95% of real-world use:
- Level 3 (default) โ general-purpose. 430 MB/s encode, good ratio. Use for log pipelines, API response compression, Kafka message compression, S3 object storage.
- Level 1 โ maximum speed. ~700 MB/s encode, ratio drops slightly (~10%). Use when CPU is scarce and latency is critical, e.g., real-time telemetry streams, gaming netcode.
- Levels 19โ22 โ archival. Very slow encode (minutes for large files), maximum ratio. Use for cold storage backups, software distribution packages, datasets that are written once and read rarely.
WHY these three thresholds: the ratio-vs-speed curve for zstd is roughly flat between levels 3 and 9 (tiny ratio gains for big speed losses), then jumps again above level 19 where the compressor uses more memory and multiple passes. Levels 4โ18 are rarely the right choice โ you're in neither the "fast" nor the "maximum" zone.
Can I compress at both the database layer and the application layer?
You can, but it's wasteful and sometimes counterproductive. If your application compresses a value with zstd before storing it in PostgreSQL, then PostgreSQL's TOAST compression tries to compress that value again โ it will fail to improve the size (the data is already entropy-optimized) and waste CPU in the attempt.
The practical rule: pick one layer and configure it consistently. Application-layer compression gives you more control (algorithm choice, dictionary training, per-field decisions) and works with any database. Database-layer compression (PostgreSQL TOAST, MySQL row compression, Cassandra per-table compression) is transparent โ you don't change application code. Choose application-layer when you have varied data types with different compression profiles, or when you need custom dictionaries. Choose database-layer when you want compression without changing existing code and the DBA team manages it centrally.
Why is my Lambda cold start slow when I use brotli?
Lambda cold starts require the runtime to download and initialize your deployment package. If your package is brotli-compressed (common in some bundlers), the Lambda initialization layer must decompress it before loading your code. Brotli decompression is fast (comparable to gzip) for large files, but the cold start adds latency for two reasons: (1) brotli support may need to be loaded as a native addon, adding initialization overhead; (2) high-quality brotli-compressed bundles can be slightly larger than gzip bundles when the decompressor doesn't have its full dictionary warm.
The fix: switch your Lambda deployment bundle format to gzip (virtually zero initialization overhead, native support in every Lambda runtime) or, better, pre-process your static assets separately from runtime code. The actual Lambda code bundle doesn't need extreme compression โ it's downloaded once and cached for the duration of the execution environment's lifetime. Save brotli-11 for assets served to browsers, not runtime packages loaded by Lambda.
Does compression help small JSON API responses?
Below about 200 bytes: typically not, and often hurts. gzip adds ~20 bytes of header overhead, and on 50โ150 bytes of JSON, the compressor may not find enough repeated patterns to offset that overhead. A 100-byte response could become a 110-byte response.
Above 1 KB: almost always yes, often dramatically. A 2 KB JSON response with repeated field names compresses to 400โ600 bytes with gzip โ a 70โ80% reduction in bytes transferred. Above ~10 KB: gzip reliably achieves 65โ75% reduction on typical JSON. The inflection point varies by JSON structure: responses with many repeated field names compress better than responses that are mostly unique string values. Test with your actual payload shapes using echo '{"key":"value"...}' | gzip | wc -c before setting your gzip_min_length threshold.
How do CDNs handle compression?
Modern CDNs (Cloudflare, Fastly, AWS CloudFront) handle compression in two ways depending on whether the asset is cached:
For cacheable static assets: the CDN stores both a .br and .gz version (or compresses on first request and caches the result). On each subsequent request, it reads the Accept-Encoding header and serves the appropriate pre-compressed version. This is why setting Vary: Accept-Encoding on your origin responses is important โ it tells the CDN that the response varies by encoding and it should cache multiple versions.
For dynamic responses: the CDN typically proxies the request to your origin, which compresses the response. Alternatively, CDNs like Cloudflare can compress responses from origins that don't compress โ a useful fallback if you haven't configured compression on your application server. The practical recommendation: pre-compress static assets at deploy time (brotli-11 + gzip-9), let the CDN handle dynamic responses with its default gzip, and configure Vary: Accept-Encoding on every compressible response.
Is there a "best" compressor?
For text data with a balance of speed and ratio: zstd at level 3 is the current best general-purpose answer. It beats gzip on ratio, beats brotli on encode speed, and has first-class dictionary support for small messages.
For text data where ratio is paramount and encode speed doesn't matter: zstd-19 or brotli-11 for offline/build-time compression. The two are competitive; zstd-19 is faster to encode, brotli-11 sometimes wins a few percent on HTML/CSS/JS.
For media: always use format-native compression โ AVIF or WebP for images, H.264/H.265/AV1 for video, AAC/Opus for audio. General-purpose compressors cannot improve on these formats because the formats were designed with perceptual redundancy elimination, not just byte-level patterns.
The only honest answer: benchmark with your data. A "best" compressor changes based on whether you're compressing 100-byte RPC messages, 50 MB log chunks, or streaming 4K video.