Back-of-Envelope Estimation

Section 1

TL;DR — The Napkin That Saved a Million Dollars

Powers of 2 and the five numbers that let you estimate almost anything
The reference numbers every engineer carries in their head (QPS, storage sizes, latencies)
A 5-step estimation framework you can use in any interview or design review
Worked examples for Twitter, YouTube, chat systems, URL shorteners, and notification platforms at scale

Back-of-envelope estimation is the ability to quickly calculate whether a design will work at the required scale — before writing a single line of code.

Here's an analogy everyone understands. You're planning a road trip. Before you get in the car, you do some quick math: "It's 500 miles. My car gets 30 miles per gallon. Gas is $3.50 a gallon. So I need about 17 gallons, which costs roughly $60." You didn't calculate it to the penny. You didn't pull out a spreadsheet. You did napkin math — just enough to know whether the trip is affordable.

That's exactly what back-of-envelope estimation is in system design. Instead of miles and gallons, you estimate requests per secondHow many user actions (page loads, API calls, searches) hit your servers every second. Abbreviated QPS (queries per second). A busy website might handle 10,000-100,000 QPS., storage in terabytes, and bandwidthHow much data flows through your network per second, measured in Mbps or Gbps. Think of it as the width of a highway — more lanes means more cars (data) can travel at once. in gigabits per second. The goal isn't perfection — it's getting the right order of magnitudeThe power-of-10 "bucket" a number falls in. Is it 1GB or 1TB? That's a 1,000x difference. Getting the order of magnitude right (even if the exact number is off by 2-3x) is what matters in estimation.. Is the answer 1 gigabyte or 1 terabyte? That 1,000x difference changes your entire architecture.

Why does this matter? Three reasons, and they're all big.

In interviews, estimation shows you understand real-world scale. Anyone can say "use a database." Only someone who's done the math can say "we need ~18TB of storage per year, so we should plan for sharding from day one." That's the difference between a junior answer and a senior answer.

In production, estimation prevents disasters. Imagine deploying a system that needs 50TB of storage when you budgeted for 5TB. Or designing a single-server architecture for a feature that actually needs to handle 100,000 requests per second. These mistakes cost real money and real time — and 30 seconds of napkin math could have caught them.

In design reviews, estimation is the fastest sanity checkA quick, rough calculation to make sure your design isn't wildly off. You're not looking for the exact answer — you're checking that you're in the right ballpark. "Do we need 1 server or 1,000?" is a sanity check. you have. Before anyone writes a design document, before anyone provisions infrastructure, a 30-second calculation can tell you whether the approach is even in the right ballpark.

What: Back-of-envelope estimation is quick, rough math to figure out if a system design can handle the required scale. You calculate requests per second, storage needs, and bandwidth — not to the exact byte, but to the right order of magnitude.

When: At the START of any system design — in interviews, design reviews, or before provisioning infrastructure. Do the math first, build second.

Key Principle: You don't need exact numbers. You need to know if the answer is 1GB or 1TB. That 1,000x difference is what changes your architecture. If you're within 2-3x of the real answer, your estimation did its job.

Back-of-envelope estimation is napkin math for system design. Just like estimating gas cost before a road trip, you estimate requests per second, storage, and bandwidth before building. The goal isn't precision — it's getting the right order of magnitude so your architecture decisions are grounded in reality, not guesses.

Section 2

The Scenario — Why Interviewers Love This

Picture this. You're in a system design interview. The interviewer says: "Design a URL shortener like bit.ly." You feel confident. You start talking about database schemas, load balancers, maybe even mention consistent hashing. The interviewer listens politely for two minutes, then interrupts:

"Wait — how many URLs per day? How much storage per year? What QPS does the read path need to handle?"

You freeze. You have no idea. You were so busy designing the how that you never figured out the how much. And without "how much," your entire design is floating in the air with no foundation. Is one database enough or do you need ten? Should you cache aggressively or is the load light enough to skip it? You can't answer any of these questions without numbers.

Now picture a different candidate in the same interview. Same question — "Design a URL shortener." But this candidate starts differently:

Estimate daily writes: "Let's assume 100 million new URLs per day. That's a reasonable number for a service like bit.ly."

Convert to per-second: "100M per day / 100,000 seconds per day = ~1,200 URL writes per second."

Estimate storage: "Each URL entry is maybe 500 bytes (original URL + short code + metadata). So 100M x 500 bytes = 50 GB per day, about 18 TB per year."

Estimate reads: "Read-to-write ratio is probably 100:1 for a URL shortener (people click links way more than they create them). So 1,200 x 100 = 120,000 reads per second."

Design implication: "A single MySQL handles maybe 10K QPS. So we need ~12 read replicas or a caching layer. Let's use Redis for the hot URLs."

That took 30 seconds. And it told the interviewer three things: this candidate understands scale, this candidate makes data-driven decisions, and this candidate won't accidentally design a system that falls over on day one. This candidate gets the offer.

Here's the key insight that changes everything about how you approach estimation:

You don't need exact numbers. You need to know if the answer is 1GB or 1TB. That's a 1,000x difference that changes everything about your architecture. If your estimate is off by 2x or even 3x, that's fine — you're still making the right architectural decisions. But if you're off by 1,000x because you didn't estimate at all, you'll build the wrong system entirely.

Think about it this way. If you estimate 5,000 QPS and the real answer is 8,000 QPS, you're fine — both numbers suggest the same architecture (a few servers with a cache layer). But if you estimate 5,000 QPS and the real answer is 5,000,000 QPS, you're going to build something that collapses on launch day. Estimation doesn't need to be precise. It needs to land in the right order of magnitude.

Think First

An interviewer asks you to design Twitter's home timeline. Before reading further, try to estimate: how many timeline reads per second does Twitter handle? Hint: Twitter has ~400 million monthly active users. If each active user opens the app twice a day, how many timeline reads per second is that?

400M users x 2 reads/day = 800M reads/day. Divide by 100,000 (seconds in a day) = ~8,000 reads/sec average. But peak is 2-3x average, so ~20,000 reads/sec at peak.

Interviewers love estimation because it separates candidates who understand scale from those who just draw boxes on a whiteboard. The candidate who does 30 seconds of math first — calculating QPS, storage, and bandwidth — makes data-driven architectural decisions. The one who skips it designs blind. Estimation doesn't need to be exact; it needs to be in the right order of magnitude.

Section 3

The Foundation — Powers of 2 and Quick Math Tricks

Before you can estimate anything, you need some numbers burned into your brain. Not hundreds of numbers — just a handful. These are the "multiplication tables" of system design. Once they're automatic, estimation becomes as easy as mental arithmetic.

Let's start with the single most important reference table in all of system design: the powers of 2Because computers work in binary (base 2), all storage and memory sizes are powers of 2. A kilobyte is 2^10 bytes (1,024), a megabyte is 2^20 bytes (1,048,576), and so on. Knowing these lets you instantly convert between units.. Everything in computing — memory, storage, network packets — is measured in powers of 2. If you know five of them, you can estimate almost anything.

Power	Exact Value	Approximation	Unit	Real-World Example
`2¹⁰`	1,024	~1 Thousand	1 KB	A short email or a tiny JSON response
`2²⁰`	1,048,576	~1 Million	1 MB	A high-quality photo or a minute of MP3 audio
`2³⁰`	~1.07 Billion	~1 Billion	1 GB	A feature-length movie (compressed) or 1,000 photos
`2⁴⁰`	~1.1 Trillion	~1 Trillion	1 TB	A small library's worth of books, or ~500 hours of HD video
`2⁵⁰`	~1.13 Quadrillion	~1 Quadrillion	1 PB	Netflix's entire content library, or ~500 million photos

The pattern is beautiful in its simplicity: every 10 powers of 2, you jump by 1,000x. KB to MB is 2¹⁰. MB to GB is another 2¹⁰. GB to TB? Another 2¹⁰. So if someone says "we have 500 million records at 2KB each," you instantly know: 500M x 2KB = 1 billion KB = 1 million MB = 1,000 GB = 1 TB. Three hops up the ladder. That calculation should take you two seconds.

Now for the second set of numbers you need: time conversions. These come up in literally every estimation because you almost always start with "X per day" and need to convert to "Y per second" (since servers think in seconds, not days).

There are exactly 86,400 seconds in a day (60 x 60 x 24). That's annoyingly specific. For napkin math, we round it: 1 day ≈ 100,000 seconds (or 10⁵). It's only off by about 15%, which is nothing for estimation purposes. This single shortcut makes every daily-to-per-second conversion trivial.

A few more shortcuts that show up constantly:

Fact	Exact Value	Napkin Approximation	When You Use It
Seconds in a day	86,400	~10⁵ (100,000)	"X per day" to "Y per second"
Seconds in a year	31,536,000	~3 x 10⁷ (30 million)	Yearly storage or data growth
Seconds in a month	~2,592,000	~2.5 x 10⁶ (2.5 million)	Monthly billing or capacity
80/20 rule	Pareto principle	80% of traffic hits 20% of data	Cache sizing — you only need to cache the hot 20%
Peak vs average	Varies	Peak = 2-3x average	Capacity planning — don't size for average, size for peak

Memorize these: 2¹⁰ ≈ 1K, 2²⁰ ≈ 1M, 2³⁰ ≈ 1G, 2⁴⁰ ≈ 1T. And: 1 day ≈ 10⁵ seconds. With just these five facts and basic multiplication, you can estimate storage, bandwidth, and QPS for any system. Everything else on this page builds on top of these five numbers.

Think First

A social media app has 200 million daily active users. Each user makes an average of 5 API requests per session, and they open the app 3 times a day. How many requests per second does the system handle on average? What about at peak?

200M users x 5 requests x 3 sessions = 3 billion requests/day. Divide by 10^5 = 30,000 QPS average. Peak at 2-3x = 60,000-90,000 QPS.

The foundation of all estimation is five numbers: 2^10 = 1K, 2^20 = 1M, 2^30 = 1G, 2^40 = 1T, and 1 day = 10^5 seconds. Every step up the powers-of-2 staircase is 1,000x. To convert daily traffic to QPS, divide by 100,000. To account for traffic spikes, multiply by 2-3x. These shortcuts make any estimation problem a 30-second exercise.

Section 4

The Reference Numbers — What a Single Server Can Do

Knowing how to do the math is only half the battle. You also need to know what to compare your numbers against. When your calculation says "I need 50,000 QPS," you need to instantly know: can one server handle that, or do I need fifty?

These are the benchmarksStandard performance measurements for common technologies. These aren't exact (they depend on hardware, configuration, query complexity, etc.), but they give you the right order of magnitude for estimation. that experienced engineers carry in their heads. They're not exact — a well-tuned PostgreSQL on beefy hardware might do 50K QPS, while a poorly indexed one on a micro instance might do 500. But for estimation, you don't need exact. You need the right order of magnitude.

Compute Benchmarks (QPS — Queries Per Second)

How much work can a single instance of each technology handle? These numbers assume reasonable hardware (4-8 cores, 16-32GB RAM) and typical workloads. The ranges reflect different query complexity — simple key lookups are fast, complex joins are slow.

Technology	Typical QPS	Why This Number?	Use in Estimation
Web server (Node.js/Go)A single application server handling HTTP requests. Go and Node.js are popular for high-throughput API servers because they handle concurrent connections efficiently using goroutines or event loops.	10K-50K req/sec	Simple JSON responses are CPU-cheap; the bottleneck is usually the database behind it	"I need 30K QPS" = 1-3 app servers
MySQL / PostgreSQLThe two most popular relational databases. MySQL is widely used in web apps (Facebook, Uber). PostgreSQL is known for correctness and advanced features (Instagram, Stripe). Both have similar QPS characteristics.	5K-10K QPS	Disk I/O is the bottleneck; indexed reads are fast, complex joins are slow; writes need disk flush	"I need 50K reads/sec" = ~5-10 read replicas
RedisAn in-memory key-value store used for caching, session storage, and real-time data. Because everything lives in RAM (no disk reads), it's incredibly fast. The tradeoff: your data must fit in memory, and RAM is expensive.	100K-200K ops/sec	Everything is in RAM — no disk reads. Simple GET/SET operations. Single-threaded but event-loop-based	"I need to cache 500K reads/sec" = 3-5 Redis nodes
Kafka brokerApache Kafka is a distributed message queue used for event streaming. A "broker" is one Kafka server. It's optimized for sequential writes to disk (append-only logs), which is why it can handle such high throughput.	100K-500K msg/sec	Sequential disk writes (append-only) are fast; consumer reads are also sequential; batching helps hugely	"I need 1M events/sec" = 2-10 Kafka brokers
ElasticsearchA search engine built on Apache Lucene. Used for full-text search, log analytics, and real-time data exploration. Slower per query than a database because it's doing complex text matching and relevance scoring.	1K-5K queries/sec	Full-text search with relevance scoring is CPU-intensive; complex aggregations are slower	"I need search across 1B docs" = cluster with multiple nodes

Storage Benchmarks (How Big Is Everything?)

When you're estimating storage, you need to know how big typical pieces of data are. These sizes include reasonable metadata (timestamps, user IDs, etc.) — not just the raw content.

Data Type	Typical Size	Why This Size?	Estimation Example
A text tweet / short message	~1 KB	280 chars of UTF-8 text (~560 bytes) + user ID, timestamp, retweet count, metadata	500M tweets/day x 1KB = 500GB/day = ~180TB/year
A JSON API response	1-10 KB	Typical REST response with a few fields; larger if it includes nested objects or lists	10K QPS x 5KB average = 50MB/sec bandwidth
A compressed photo (JPEG)	200KB - 2MB	Phone photos are 3-5MB raw; JPEG compression gets them to 200KB-2MB depending on quality	50M photos/day x 1MB = 50TB/day (Instagram-scale)
1 minute of video (720p)	10-20 MB	Compressed H.264 at 720p is roughly 2-3 Mbps bitrate = ~15MB per minute	500 hours uploaded/min (YouTube) x 15MB/min = 7.5GB/min new content
1 hour of video (1080p)	1-3 GB	1080p at 5-8 Mbps bitrate; Netflix encodes at multiple quality levels (adaptive bitrate)	Streaming to 10M users at 5Mbps = 50 Tbps total bandwidth

Network Benchmarks (How Fast Does Data Travel?)

Network speed affects how you design for latencyThe time it takes for a request to travel from the user to the server and back. Measured in milliseconds. A user in New York hitting a server in Virginia has ~10ms latency. Hitting a server in London? ~80ms. This is governed by the speed of light in fiber optic cable. (how fast users get a response) and throughputHow much total data you can push through the network per second. A 1 Gbps link can transfer about 125 MB/sec. If your system needs to serve 500 MB/sec of video, you need at least 4 Gbps of network capacity. (how much data you can move). The speed of light is a hard constraint — no amount of engineering can make New York to London faster than ~80ms round trip.

Network Path	Typical Latency	Bandwidth Equivalent	Design Implication
Same datacenter (rack-to-rack)	~0.5 ms	10-25 Gbps links	Internal service calls are nearly free; design for many small calls
Same region (e.g., us-east-1a to 1b)	~1-2 ms	Up to 25 Gbps	Cross-AZ replication is fast enough for sync writes
Cross-region (US East to West)	~40 ms	1-10 Gbps	Too slow for sync calls; use async replication
Cross-continent (US to Europe)	~80 ms	Varies	Need CDN or regional replicas for good user experience
US to Asia-Pacific	~150-200 ms	Varies	Multi-region deployment is essential; cache aggressively at the edge

One conversion that trips people up: 1 Gbps = 125 MB/sec. That's because 1 byte = 8 bits, so you divide by 8. If your server has a 1 Gbps network link and needs to serve 500 MB/sec of video, you need at least 4 links (or a 10 Gbps connection). This is a common gotcha in bandwidth estimation.

Is Redis 100K or 200K QPS? Doesn't matter for estimation. Just know it's ~100K, not ~10K or ~1M. Is a photo 200KB or 2MB? For napkin math, just use "about 1MB." The difference between 100K and 200K changes your server count by 2x. The difference between 100K and 10K changes it by 10x. Estimation catches the 10x errors, not the 2x ones.

Let's put these reference numbers to work with a quick example. Say you're designing a chat application and you estimate 10 million messages per day. Each message is about 1KB. How much storage per year? And can a single database handle the write load?

Daily storage: 10M messages x 1KB = 10GB per day

Yearly storage: 10GB x 365 = ~3.65TB per year (call it ~4TB)

Write QPS: 10M/day / 10⁵ = 100 writes/sec (average). Peak = ~300 writes/sec.

Can one DB handle it? PostgreSQL does ~5K-10K QPS. 300 writes/sec is well within range. One database is fine for writes. But reads might be higher (people read messages more than they send them), so plan for read replicas.

In 30 seconds, you went from "design a chat app" to "4TB/year, one DB handles writes, might need read replicas." That's the power of having reference numbers in your head. You didn't need a spreadsheet. You didn't need a calculator. You needed five numbers and basic arithmetic.

Think First

YouTube says 500 hours of video are uploaded every minute. Using the reference numbers above, estimate: (1) How much new storage per day? (2) How many Gbps of ingest bandwidth? Try it before reading on.

500 hours/min = 30,000 hours/hour = 720,000 hours/day. At ~2GB per hour (1080p), that's ~1.44 PB/day of new content. For bandwidth: 500 hours/min = 30,000 min of video/min. At 15MB per minute of video, that's 450GB/min = 7.5GB/sec = 60 Gbps ingest bandwidth.

Reference numbers are the benchmarks you compare your estimates against. A web server handles ~10K-50K QPS, a SQL database ~5K-10K, Redis ~100K-200K. A text message is ~1KB, a photo ~1MB, a minute of video ~15MB. Within a datacenter, latency is ~0.5ms; across continents, it's ~80-200ms. Don't memorize exact values — memorize orders of magnitude. The goal is knowing whether you need 1 server or 100, not whether you need 47 or 53.

Section 5

The 5-Step Framework — How to Estimate Anything

Every estimation you'll ever do in an interview or at work follows the same five steps. It doesn't matter whether you're estimating Twitter, YouTube, or a tiny startup app — the process is identical. Learn these five steps once, and you can estimate any system on the planet in under two minutes.

Think of it like a recipe. You wouldn't bake a cake by throwing random ingredients into an oven and hoping for the best. You follow steps: measure flour, add eggs, mix, bake. Estimation is the same — a fixed sequence of multiplications that always produces a useful answer.

Always show your work. In interviews, the PROCESS matters more than the final number. A wrong number with clear reasoning beats a right number pulled from thin air. Interviewers want to see you think step by step, state assumptions explicitly, and catch your own mistakes. That's the real skill they're testing.

Quick Example: Running All 5 Steps

Let's run through the entire framework with a simple app: 100K DAU, 10 actions per user per day, 2 KB per action. This could be a notes app, a to-do list, or a simple social feed. Watch how the five steps chain together — each one feeds the next.

See how fast that was? Five multiplications, and now you know exactly what infrastructure this app needs: one server, one database, done. No Kubernetes. No Redis. No microservices. The numbers told you the architecture should be dead simple — and that insight alone saves months of over-engineering.

This is the real payoff of estimation. You don't estimate to impress the interviewer with math — you estimate to make architecture decisions. 12 QPS means one server. 12,000 QPS means load balancers and caching. 12 million QPS means sharding, CDNs, and a global infrastructure. The same framework, the same five steps — but the numbers change everything.

Every estimation follows five steps: (1) clarify the scale (DAU, actions per user), (2) calculate QPS from those numbers, (3) estimate storage with overhead and replicas, (4) calculate bandwidth by separating ingress from egress, (5) sanity-check against known systems. The process matters more than the answer — interviewers want to see you think systematically, not pull numbers from thin air.

Section 6

Worked Example 1 — Twitter: How Many Tweets Per Second?

Twitter (now X) is the classic example of a QPS-heavy estimation. The core question is: how many tweets are being written per second, and how many are being read per second? The gap between those two numbers reveals the entire architecture.

Before we touch any math, let's pin down our assumptions. In an interview, you'd say these out loud — the interviewer wants to hear you reason about what's realistic.

400 million DAU — Twitter-scale active users
Average user tweets 2x per day — most users tweet rarely; power users tweet 20+, so 2 is a reasonable average
Average user reads their timeline 20x per day — scrolling through the feed, checking notifications
Each tweet: ~140 chars of text + metadata (timestamps, user ID, etc.) ≈ 1 KB. With media links and preview data: ~5 KB average

The Full Calculation Chain

What the Numbers Tell You About Architecture

The 10:1 read-to-write ratio is the single most important number from this estimation. It tells you that reads dominate everything. That one insight drives all of these architectural decisions:

Caching is mandatory — with 93K reads/sec, you can't hit the database for every timeline request. A RedisAn in-memory data store used as a cache. Data lives in RAM instead of on disk, so reads are microseconds instead of milliseconds. Perfect for hot data that gets read millions of times. caching layer absorbs 95%+ of reads.
Read replicas — even with caching, you need multiple database copies to handle cache misses. A primary database handles writes; read replicas handle the overflow.
Fanout-on-write vs. fanout-on-read — when a user tweets, do you pre-compute every follower's timeline (write amplification) or compute it when each follower opens the app (read amplification)? At 10:1 read:write, pre-computing timelines at write time saves 10x the read-time work.
CDN for media — at 3.7 Gbps egress, you absolutely need a CDNContent Delivery Network. A global network of servers that cache and serve content from locations close to users, reducing latency and offloading traffic from your origin servers. to distribute the load geographically.

The read:write ratio (10:1) tells you this is a READ-heavy system. That single insight drives the entire architecture: caching, read replicas, CDN for media. If the ratio were 1:1 (like a logging service), you'd design a completely different system — write-optimized storage, append-only logs, minimal caching. Always calculate the ratio. It's the compass for your entire design.

Twitter at 400M DAU produces ~9,300 tweets/sec (peak: 28K) but ~93,000 reads/sec (peak: 280K) — a 10:1 read-to-write ratio. Storage grows at 4 TB/day (~1.5 PB/year). This read-heavy profile demands aggressive caching, read replicas, pre-computed timelines, and CDN distribution. The read:write ratio is the most important number — it determines whether you optimize for reads (cache everything) or writes (append-only storage).

Section 7

Worked Example 2 — YouTube: How Much Storage Per Day?

YouTube is the storage and bandwidth monster of the internet. While Twitter deals mostly with tiny text blobs, YouTube deals with massive video files. The numbers here get staggering fast — and that's exactly why this is a great estimation exercise. It forces you to think about what happens when data sizes go from kilobytes to gigabytes.

2 billion MAU, ~800 million DAU
500 hours of video uploaded per minute — this is YouTube's actual published stat
Average video: 1080p raw ≈ 2.5 GB/hour. After encoding/compression: ~500 MB/hour
Each video stored in 5 resolutions (1080p, 720p, 480p, 360p, 240p) — total is roughly 1.5x the 1080p encoded size
Average user watches 40 minutes/day

Storage: The Upload Side

Let's start with what goes IN — how much raw video is YouTube receiving, and how much space does it take after processing?

Bandwidth: The Download Side

Storage is expensive, but egress bandwidth is where the real money goes. Every time someone watches a video, YouTube has to push that data to their device. With 800 million people watching 40 minutes a day, the numbers get wild.

Notice the EGRESS bandwidth: 55 Tbps average, up to 165 Tbps at peak. That's not a typo. No origin server cluster can serve that. This is why YouTube has its own CDN — Google Global Cache — physically installed inside ISP networks worldwide. The video bytes literally travel from a server sitting in your ISP's building, not from a Google data center across the country. Without this, YouTube simply wouldn't work.

The lesson from YouTube's estimation is clear: media changes everything. Text-based systems like Twitter need petabytes per year. Video-based systems like YouTube need petabytes per day. When you see "video" or "images" in a system design question, immediately multiply all your storage and bandwidth estimates by 100-1000x compared to text-only systems.

YouTube receives 500 hours of video per minute, producing ~1.6 PB/day of stored data across 5 resolutions and 3 replicas (~584 PB/year). The real cost is egress: 800M users watching 40 min/day creates ~55 Tbps of average bandwidth, peaking at 110-165 Tbps. These numbers make CDN deployment mandatory — origin servers cannot serve this load. The key lesson: video content multiplies storage and bandwidth estimates by 100-1000x compared to text.

Section 8

Worked Example 3 — URL Shortener: The Interview Classic

If there's one estimation that every system design candidate should have memorized, it's this one. The URL shortener (think bit.ly or tinyurl.com) is the single most common estimation question in interviews. Why? Because it's simple enough to do in 3-4 minutes, it covers all five estimation steps, and the numbers lead to interesting architecture decisions.

A URL shortener does one thing: you give it a long URL, and it gives you a short one. When someone clicks the short URL, it redirects them to the original long URL. That's it. But the scale of doing this billions of times reveals some surprising math.

100 million new URLs shortened per day — bit.ly scale
Read:write ratio of 100:1 — URLs are created once but clicked many, many times
Each URL mapping: short URL (7 chars) + long URL (avg 200 chars) + metadata (timestamps, user ID, click count) ≈ 500 bytes
Retention: 5 years — short URLs should keep working for at least 5 years

Traffic

Storage

Can We Generate Enough Short Codes?

Here's where the estimation gets interesting. We need 182.5 billion unique short URLs over 5 years. Each short URL is 7 characters long, using Base62An encoding using 62 characters: lowercase a-z (26), uppercase A-Z (26), and digits 0-9 (10). Used for URL-safe short codes because all 62 characters are valid in URLs without encoding. encoding (a-z, A-Z, 0-9). How many possible codes is that?

Caching

At 116,000 reads per second, can we reduce the database load with a cache? Absolutely. The 80/20 ruleAlso called the Pareto principle. Roughly 80% of effects come from 20% of causes. For URLs: 80% of clicks go to just 20% of URLs. (Pareto principle) tells us that about 20% of URLs receive 80% of the traffic. If we cache just the popular ones:

This estimation takes 3-4 minutes in an interview. Practice it until you can do it on a whiteboard without hesitation. It's the "Hello World" of system design estimation — simple enough to be approachable, but rich enough to demonstrate all five framework steps. If you can only memorize one worked example, make it this one.

A URL shortener at 100M URLs/day needs ~1,160 writes/sec but 116,000 reads/sec (100:1 ratio). Storage grows at 50 GB/day (~90 TB over 5 years). Base62 encoding with 7 characters gives 3.52 trillion possible codes — 19x more than needed. Caching the hot 20% of URLs (just 10 GB) handles 80% of reads, reducing database load to ~23K/sec. This is the most common interview estimation question — practice it until it's automatic.

Section 9

Worked Example 4 — WhatsApp: Messages at Global Scale

WhatsApp is a mixed estimation — it combines QPS, storage, AND persistent connections. It's also the system where media vs. text reveals the most surprising insight: the text messages that everyone thinks of as "WhatsApp's data" are actually a rounding error compared to the photos and videos people send.

2 billion registered users, ~500 million DAU
Average user sends 40 messages/day
Average text message: 100 bytes of text + metadata (timestamps, read receipts, encryption overhead) ≈ 200 bytes total
10% of messages include media: photos average 200 KB, videos average 5 MB (WhatsApp compresses both aggressively)
Messages stored on server for 30 days (end-to-end encrypted, kept until delivered to all recipients)

Message Traffic

Storage: Text vs. Media (The Surprise)

Here's where the estimation gets really interesting. Most people guess that text messages are WhatsApp's storage challenge. They're wrong — by a factor of a thousand. Let's do the math.

Persistent Connections

WhatsApp is a real-time messaging app. That means users don't poll the server asking "any new messages?" every few seconds — instead, each device maintains a persistent connectionA long-lived network connection (like a WebSocket) that stays open between the client and server. Instead of connecting fresh for every message, the phone keeps one connection open and receives messages instantly through it. to the server. Messages arrive instantly through this always-open channel. But maintaining millions of simultaneous connections requires its own infrastructure.

The surprise: text messages are TINY (4 TB/day). Media is 1,000x larger (5.2 PB/day). This is why WhatsApp compresses images aggressively (reducing quality before upload), limits video sizes to 16 MB, and strips metadata from photos. Storage cost drives product decisions. Every time you see a "file size limit" in a messaging app, it's because someone ran exactly this estimation and realized media would bankrupt them without compression.

WhatsApp's estimation also reveals something about architecture: this is really two separate systems. The text/metadata system is a classic database problem (4 TB/day, 231K QPS — sharded database with replication). The media system is an object storage problem (5.2 PB/day — S3-like blob storage with CDN delivery). Most messaging apps split these into entirely different storage backends because the requirements are so different.

WhatsApp at 500M DAU handles 20B messages/day (~231K/sec, peak 700K/sec). The key insight: text storage is 4 TB/day, but media storage is 5.2 PB/day — a 1,300x difference. Media dominates everything. The system also maintains ~200M concurrent persistent connections, requiring 2,000-4,000 dedicated connection servers. WhatsApp is really two systems: a database for text metadata and an object store for media. This estimation explains why every messaging app has aggressive file size limits and image compression.

Section 10

Worked Example 5 — Notification System: Push at Scale

Every app sends push notifications — order updates, social activity, flash-sale alerts. It feels simple: fire off a message to a phone. But when 200 million users each get 5 pushes a day, you're suddenly dealing with a billion outbound API calls per day and the single hardest problem in notification systems: fan-outFan-out is when one event triggers many downstream actions. A single "flash sale starts" event might generate 200 million individual push notifications — that's extreme fan-out..

This example is different from the previous four because the bottleneck isn't storage or read QPS — it's outbound throughput to external APIs. You're at the mercy of Apple's APNs, Google's FCM, and browser Web Push servers, each with their own rate limits and latency profiles.

Assumptions

Parameter	Value	Why This Number
DAU	200 million	Large e-commerce or social platform (think Shopify-scale or mid-tier social app)
Pushes per user per day	5	Order updates, promotions, social activity, reminders, system alerts
Notification payload	~1 KB	Title, body, deep link URL, metadata, device token — JSON-encoded
Device split	60% Android, 35% iOS, 5% Web	Global average skews Android-heavy; each platform = separate API call
Push providers	3 (FCM, APNs, Web Push)	Each device gets a dedicated API call to its platform's push service
Peak multiplier	3×	Notifications bunch around morning (9 AM), lunch (12 PM), evening (7 PM)
History retention	90 days	Users scroll through past notifications in the app's notification center

The Math

The Killer Scenario: Broadcast Fan-Out

The numbers above assume normal traffic — individual notifications for individual users. But what happens when marketing decides to send a "Flash Sale Starts NOW!" push to all 200 million users at once?

That single event becomes 200 million API calls to external push providers. Let's do the math on how long that takes depending on how many worker machines you throw at it:

Workers	Calls/sec per worker	Total throughput	Time to deliver
1	10,000	10K/sec	20,000 sec = 5.5 hours
10	10,000	100K/sec	2,000 sec = 33 minutes
100	10,000	1M/sec	200 sec = 3.3 minutes
1,000	10,000	10M/sec	20 sec

With a single worker doing 10,000 API calls per second, it takes 5.5 hours to reach everyone. A "flash sale" notification arriving 5 hours late is worse than useless — it's embarrassing. Even 100 workers only get you down to 3 minutes. For truly instant broadcast, you need ~1,000 workers spun up temporarily — which is why notification systems use auto-scaling worker poolsA group of worker machines that automatically grows and shrinks based on queue depth. When a broadcast event floods the queue with 200M messages, the pool scales from 50 workers to 1,000+ within minutes, then scales back down when the queue is drained. behind a message queue.

Architecture Implications

The numbers tell us exactly what architecture to build:

Message queue (Kafka or SQS) — You can't call 35,000 external APIs per second directly from your app servers. You need a buffer. The queue absorbs bursts (especially broadcast events) and lets workers drain at a sustainable rate.
Separate queues per provider — APNs, FCM, and Web Push have different rate limits, different payload formats, and different retry behaviors. Separate queues let you tune each independently.
Auto-scaling worker pool — Normal traffic needs ~50 workers. A broadcast event needs 1,000+. Static provisioning either wastes money (1,000 idle workers 99% of the time) or fails under load (50 workers trying to send 200M notifications). Auto-scaling solves both.
Rate limiting per provider — Apple throttles APNs if you exceed their limits. Google throttles FCM. Your workers need token-bucket rate limiters per provider to avoid getting temporarily blocked.
Delivery tracking as a separate pipeline — Don't block the send path waiting for delivery receipts. Fire-and-forget the notification, then process delivery callbacks asynchronously.

The killer scenario in notifications isn't the average 35K/sec — it's the broadcast fan-out. A single marketing event creating 200M API calls in seconds is impossible without massive parallelism. This is why every notification system at scale uses queues with controlled fan-out rates, not direct API calls. The estimation doesn't just size the system — it proves WHY the architecture must be queue-based.

A 200M-DAU notification system generates 1 billion pushes per day (~35K/sec at peak) with 1 TB/day of storage for notification history. The critical challenge isn't storage — it's outbound fan-out. A single broadcast event means 200M external API calls, requiring 1,000+ workers to deliver in under a minute. Architecture: Kafka for buffering, auto-scaling workers for fan-out, separate queues per push provider (APNs/FCM/Web Push), and async delivery tracking.

Section 11

Common Mistakes — Estimation Traps Everyone Falls Into

Even after you learn the 5-step method, there are traps that can silently destroy your estimates. Each one seems small — a forgotten multiplier here, a unit confusion there — but they compound. A 2× error in assumptions times an 8× unit confusion times a 3× missing replication factor = a 48× error in your final answer. That's the difference between "we need 10 servers" and "we need 480 servers." Let's walk through the seven traps and how to dodge each one.

The trap: You spend 2 minutes calculating that you need exactly 47,392 QPS. You could have said "about 50K" in 10 seconds and been just as right. In estimation, precision past one significant figure is a waste of time — and worse, it creates a false sense of confidence. "47,392" sounds like you know the answer. You don't. Your input assumptions are guesses, so your output is a guess too.

Why it hurts: In interviews, you have 60 seconds for estimation. Spending 90 of them on long division means you never get to the architecture conclusions — which is the part the interviewer actually cares about.

The fix: Round everything aggressively. 86,400 seconds/day → "about 100K." 1,000,000,000 ÷ 100,000 → "about 10,000." If rounding changes your answer by less than 2×, the rounding doesn't matter. Save your brainpower for the assumptions, not the arithmetic.

The trap: You calculate average QPS as 10,000 and provision for 10,000. Every evening at 8 PM, traffic hits 30,000 and your system falls over. Users see errors. Pages time out. Your boss asks why you didn't plan for this.

Why it hurts: Average traffic is a mathematical fiction. Nobody experiences "average" — they experience the actual moment they're using the system, which is usually during peak hours. If you size for average, you're guaranteeing failure during the hours that matter most.

The fix: Always calculate BOTH average AND peak. Peak is typically 2-3× average for social apps, 5× for e-commerce, and 10-100× for flash sales or live events. State it explicitly: "Average is 10K QPS, peak is 30K, so I'll provision for 40K with headroom." The interviewer wants to hear the word "peak."

The trap: "A tweet is 280 characters = 280 bytes." Wrong. A tweet stored in a database includes: user ID (8 bytes), tweet ID (8 bytes), timestamp (8 bytes), retweet count (4 bytes), like count (4 bytes), reply-to ID (8 bytes), language code (2 bytes), geo coordinates (16 bytes), plus JSON encoding overhead, database indexes, and row metadata. The real size is 1-5 KB — 4 to 18 times bigger than the raw text.

Why it hurts: This error is multiplicative. If you estimate 500 million tweets per day at 280 bytes, you get 140 GB. At 3 KB (more realistic), you get 1.5 TB. That's a 10× error that changes whether you need one database server or ten.

The fix: Always ask "what else gets stored alongside this data?" For any row, add: IDs (8 bytes each), timestamps (8 bytes each), counters (4 bytes each), indexes (typically 2-3× the row size), and encoding overhead (~20%). A safe rule of thumb: multiply your naive estimate by 5-10× to account for everything you're forgetting.

The trap: "I need 10 TB of storage." You stop there. But every production database runs with at least 3 replicas (one primary, two replicas for failover and read scaling). That's 30 TB, not 10 TB. Add backups (another copy), and you're at 40 TB. Add cross-region replication for disaster recovery, and you're at 60 TB.

Why it hurts: A 3-6× underestimate on storage directly translates to 3-6× underestimate on cost. Your capacity plan says "$50K/year" but the real bill is "$300K/year." That's a career-damaging surprise.

The fix: Always multiply raw storage by a replication factor. A reasonable default: raw × 3 for replicas, then × 1.3 for indexes and metadata, then × 1.5 for backups. Quick version: raw × 5 gets you in the right ballpark for total storage cost.

The trap: You silently use "100 million DAU" in your head, do all the math, and present the final number. The interviewer has no idea where that number came from. Did you guess? Did you know? Are you making it up? Without stated assumptions, your estimate looks like a wild guess — even if the math is perfect.

Why it hurts: In interviews, the process matters more than the answer. An interviewer who can see your assumptions can evaluate your reasoning, suggest adjustments, and have a productive conversation. An interviewer who just sees a final number can only say "that seems high" or "that seems low" — and you've wasted the most valuable part of the exercise.

The fix: Say your assumptions out loud before doing any math. "I'll assume 100 million DAU, 50 messages per user per day, and about 100 bytes per message. Sound reasonable?" This takes 5 seconds and completely transforms the impression you make. It also gives the interviewer a chance to correct you ("Actually, let's assume 10 million DAU for this problem") — which is a gift, not a failure.

The trap: Network bandwidth is measured in bits per second (Mbps, Gbps). Storage is measured in bytes (MB, GB, TB). There are 8 bits in a byte. So "1 Gbps bandwidth" = 125 MB/sec, NOT 1,000 MB/sec. That's an 8× error from a single letter.

Why it hurts: Imagine you calculate that your system needs to transfer 500 MB/sec. You provision a "500 Mbps" network link thinking it's enough. It's actually only 62.5 MB/sec — you're 8× short. Your system chokes on bandwidth that's 12.5% of what you need.

The fix: Capital B = Bytes. Lowercase b = bits. Always check which one you're using. For quick conversion: divide bits by 8 to get bytes. 1 Gbps = 125 MB/sec. 10 Gbps = 1.25 GB/sec. Write the units explicitly in every step of your calculation. If an interviewer says "Gbps" and you're working in bytes, convert immediately and say it out loud.

The trap: "We have 100 TB of data, so we need a 100 TB cache." No — you need to cache the hot data, not everything. In almost every system, 20% of the data handles 80% of the traffic. Your top 1,000 products get 50% of all views. Your top 10,000 users generate 40% of all content. Caching the hot 20% gives you 80% of the benefit at 20% of the cost.

Why it hurts: Over-estimating cache size means over-spending on RAM (which is 10-50× more expensive per GB than disk). Under-estimating it means your cache miss rate is too high and your database still gets hammered. Both are expensive mistakes.

The fix: For cache sizing, use the 80/20 rule: cache 20% of your dataset as a starting point. 100 TB dataset → 20 TB in cache. Then check: can you afford that much Redis? If not, cache 5-10% and accept more cache misses. State the trade-off: "20 TB of Redis at $25/GB/month = $500K/month. Or 5 TB at $125K/month with 60% cache hit rate instead of 80%."

"We need 1 Gbps bandwidth" means 125 MB/sec, not 1,000 MB/sec. That single uppercase-vs-lowercase letter is the difference between a system that works and one that's 8× undersized. Always clarify units, always write them out explicitly, and always convert at the start of your calculation — not the end.

Seven estimation traps can silently destroy your numbers: false precision wastes time, designing for average guarantees peak-hour crashes, ignoring metadata underestimates storage by 5-10×, forgetting replication triples your real cost, unstated assumptions make you look like you're guessing, MB vs Mb confusion causes 8× errors, and the 80/20 rule means your cache only needs to hold the hot 20%. Small errors multiply — a 2× here, 3× there, 8× there = 48× total error. Always state units, always say "peak," always mention replication.

Section 12

Interview Playbook — Nail Estimation Questions

You know the math now. But knowing math and performing it live — out loud, under pressure, with someone evaluating every word — are completely different skills. This section is your playbook: the exact script to follow, common questions interviewers ask, and the mental model of what the interviewer is actually scoring you on.

What the Interviewer Is Actually Scoring

Here's what most candidates don't realize: the interviewer does not care about your final number. They know it's an estimate. They know it'll be wrong. What they're scoring is your thinking process. Specifically, four things:

The 5-Step Script for Any Estimation Question

Memorize this sequence. It works for every estimation question, whether they ask about storage, bandwidth, servers, or cost. Each step takes about 10-15 seconds — the whole thing fits in 60 seconds.

Step 1: "Let me start with the scale"

State your DAU assumption and the read/write ratio. This grounds the entire discussion.

Say out loud: "Let's assume this platform has about 100 million DAU. For a messaging app, I'd estimate a 10:1 read-to-write ratio — people read more messages than they send."

Why this matters: The interviewer now knows your scale. If they want a different number, they'll tell you. If not, you've anchored the conversation with a reasonable assumption.

Step 2: "Let me estimate traffic"

Calculate QPS from your DAU and per-user activity. Always mention peak.

Say out loud: "If each user sends 40 messages per day, that's 4 billion messages daily. Divided by 86,400 — roughly 100K — gives about 46,000 writes per second. At peak, let's say 3× that, so about 140K writes per second."

Step 3: "Let me estimate storage"

Size per item × daily volume × retention period. Mention replication.

Say out loud: "Each message is about 200 bytes for text, plus metadata — let's call it 500 bytes. 4 billion messages × 500 bytes = 2 TB per day. Over 5 years with 3× replication, that's about 11 PB."

Step 4: "Let me estimate bandwidth"

QPS × payload size. Separate ingress (writes) and egress (reads).

Say out loud: "At peak, 140K writes/sec × 500 bytes = 70 MB/sec ingress. Reads are 10× that, so 700 MB/sec egress, which is about 5.6 Gbps."

Step 5: "Let me sanity-check"

Compare your numbers to a real system. Draw the architectural conclusion.

Say out loud: "140K writes per second is in the WhatsApp range — they reportedly handle about 60 billion messages per day, which is much higher. So 46K writes/sec feels reasonable for 100M DAU. The 11 PB storage over 5 years means we definitely need sharding — no single database holds that. I'd use consistent hashing to shard by user ID."

Practice Question: Estimate Infrastructure for a Ride-Sharing App

Here's how a strong candidate thinks through this question in real time. Notice the structure: assumptions first, math second, implications third.

Step 1 — Scale: "Uber has about 100 million MAU. DAU is maybe 30% for ride-sharing — people don't ride every day. So 30 million DAU. Plus about 5 million active drivers at any moment. The read:write ratio is interesting here — it's closer to 1:1 because drivers constantly send GPS pings (writes) and riders constantly fetch nearby drivers (reads)."

Step 2 — Traffic: "The big number is GPS pings: 5 million drivers × 1 ping every 4 seconds = 1.25 million writes per second. That dwarfs everything else. Ride requests: 30M riders × 2 rides/day = 60M/day ÷ 86,400 ≈ 700/sec. So GPS writes dominate at 1.25M/sec."

Step 3 — Storage: "GPS pings: each is ~100 bytes (lat, lng, timestamp, driver ID). 1.25M/sec × 100B × 86,400 sec = about 10 TB/day. But we don't need to keep all GPS data forever — maybe 30 days for analytics, then aggregate. Trip records: 60M/day × 2 KB = 120 GB/day — small by comparison. Total: ~10 TB/day for 30 days = 300 TB live GPS data."

Step 4 — Bandwidth: "GPS ingress: 1.25M/sec × 100B = 125 MB/sec ≈ 1 Gbps. Map tile egress for riders viewing nearby drivers: 30M riders × 10 map refreshes × 5 KB = 1.5 TB/day. Peak egress around 500 MB/sec ≈ 4 Gbps."

Step 5 — Sanity check and implications: "1.25 million GPS writes per second is enormous. PostgreSQL maxes out at maybe 50K writes/sec. So we can't use a traditional RDBMS for location data — we need something like Redis with geospatial indexes or a time-series database. The 300 TB of location data suggests we need to aggressively TTL (expire) old pings. The trip records at 120 GB/day are manageable in a standard sharded database."

Common Interview Estimation Questions

Key insight: Photos dominate everything. Text metadata is rounding error.

Quick math: Instagram gets ~100 million photos uploaded per day. Average photo after compression: ~2 MB. So that's 100M × 2 MB = 200 TB/day of new photos. With 3× replication across data centers, that's 600 TB/day. Per year: ~220 PB. Five years: over 1 exabyte. This is why Instagram uses a custom storage system — no off-the-shelf database can do this.

Don't forget: Multiple resolutions (thumbnail, medium, full) means 3-4× more storage. Plus videos (growing fast, 10-50 MB each). Photos alone drive the number to petabytes per year.

Key insight: It depends entirely on whether connections are HTTP (short-lived)Standard HTTP connections: client sends request, server responds, connection closes. Each connection is brief (milliseconds to seconds). A single server can handle thousands of these per second because they don't stick around. or WebSocket (persistent)WebSocket connections stay open for the entire user session — minutes or hours. The server has to maintain state for each one simultaneously. This is much more memory-intensive than HTTP..

HTTP (stateless): A modern server handles ~50,000 requests/second. If each request takes 20ms, only ~1,000 are truly concurrent at any moment. So 1M concurrent HTTP requests ÷ 50K/server ≈ 20 servers.

WebSocket (persistent): Each open connection consumes ~10-50 KB of memory for the socket buffer and connection state. A server with 32 GB RAM can hold ~500K-1M connections if the workload is light. So 1M WebSocket connections ≈ 2-4 servers for connections alone, but you need more for the actual message processing.

The answer interviewers want: "It depends — let me ask whether these are long-lived or short-lived connections." That clarifying question alone is worth more than any number.

Key insight: Netflix is a bandwidth monster. Video at scale consumes more internet bandwidth than almost anything else.

Quick math: Netflix has ~250 million subscribers. At peak (evening hours), roughly 10-15% stream simultaneously — call it 30 million concurrent streams. Average bitrate: ~5 Mbps (mix of SD, HD, 4K). So: 30M × 5 Mbps = 150 Tbps (terabits per second). Netflix reportedly accounts for ~15% of all global downstream internet traffic.

In bytes: 150 Tbps ÷ 8 = ~18.75 TB/sec of video data flowing from Netflix CDN edges to users. That's why Netflix operates Open Connect — their own global CDN with boxes installed directly inside ISP networks.

Practice estimation like you practice coding: timed. Set a 5-minute timer and estimate a system end-to-end. If you can't do it in 5 minutes, you're overthinking it. In a real interview, estimation is a 60-second exercise that sets up the rest of your design. The goal isn't perfection — it's proving you can think quantitatively and connect numbers to architecture decisions.

Interviewers score estimation on process, not precision: structured approach (40%), stated assumptions (30%), correct units (15%), and sanity check (15%). Use the 5-step script — scale, traffic, storage, bandwidth, sanity-check — for any estimation question. Practice timed: 60 seconds for quick estimates, 5 minutes for full walkthroughs. The answer that wins is the one that connects numbers to architecture: "140K writes/sec means we need sharding, not a single database."

Section 13

Practice Exercises — Build Your Estimation Muscle

Reading about estimation is like reading about push-ups — it doesn't make you stronger. The only way to get good at this is to do the math yourself. Set a 5-minute timer for each exercise. Don't peek at the hints until you've tried your own approach. The goal isn't to match the exact answer — it's to be within the right order of magnitudeWithin 10x of the real answer. If the answer is 5TB and you got 2TB or 15TB, you're fine. If you got 50GB or 500TB, something went wrong in your reasoning..

These exercises get progressively harder. Exercises 1-2 are warm-ups that practice the basic "users x activity x size" formula. Exercises 3-4 add complexity with replication, peak traffic, and write-heavy workloads. Exercises 5-6 are full system estimations that combine multiple dimensions — the kind you'll face in real interviews.

A messaging app has 50 million DAUDaily Active Users — the number of unique users who open the app at least once per day. This is the most common starting point for any estimation.. On average, each user sends 30 messages per day. Each text message is about 200 bytes (including metadata like timestamp, sender ID, and delivery status). 5% of messages include a photo attachment averaging 300 KB each.

Your tasks:

How much total storage does this app consume per day?
How much storage per year?
Which component dominates — text or images? By how much?

Step 1 — Text messages:

50M users x 30 messages/day x 200 bytes = 300,000,000,000 bytes = 300 GB/day of text.

Step 2 — Image attachments:

Total messages per day: 50M x 30 = 1.5 billion messages. 5% have images: 1.5B x 0.05 = 75 million images/day. Storage: 75M x 300 KB = 22,500,000,000 KB = ~22.5 TB/day of images.

Step 3 — Total:

Text (300 GB) + Images (22.5 TB) = ~22.8 TB/day.

Step 4 — Per year:

22.8 TB/day x 365 = ~8.3 PB/year.

Images dominate text by 75x (22.5 TB vs 300 GB). This tells you exactly where to focus your engineering effort: image compression, thumbnails, progressive loading, and CDN distribution. Optimizing text storage is almost irrelevant compared to fixing image storage. This is WHY estimation matters — it tells you where to spend your time.

Your API serves 2 million requests per day. The average response payload is 5 KB. Assume traffic is roughly uniform across the day (no major spikes).

Your tasks:

What's your average QPSQueries Per Second — the number of requests your system handles each second. The most fundamental throughput metric in system design.?
What's the peak QPS if spikes hit 3x average?
What's your daily egress bandwidthEgress = outgoing data from your servers to clients. This is what cloud providers charge you for. Ingress (incoming data) is usually free.?
Do you need to worry about scaling?

Average QPS:

2,000,000 requests / 86,400 seconds = ~23 QPS.

Peak QPS (3x):

23 x 3 = ~70 QPS. That's nothing for a modern server.

Egress bandwidth:

23 requests/sec x 5 KB/response = 115 KB/sec = ~0.92 Mbps.

Daily egress total:

2M requests x 5 KB = 10 GB/day. On AWS at $0.09/GB, that's about $0.90/day in bandwidth costs.

At 23 QPS average and 70 QPS peak, a single application server handles this with room to spare. A basic EC2 instance ($50-100/month) can comfortably serve 500-1,000 QPS. You don't need load balancers, caching layers, or horizontal scaling. The estimation just saved you from building infrastructure you don't need.

You're designing storage for a video surveillance system. The requirements:

1,000 cameras recording 24 hours a day, 7 days a week
Each camera records at 720p, producing 1 GB/hour of compressed H.264 video
30-day retention — footage older than 30 days is automatically deleted
Data must be stored with 2x replication (one backup copy) for durability

Your tasks:

Total raw storage needed (before replication)?
Total storage with replication?
How many 10 TB hard drives do you need?
What's the approximate hardware cost just for drives?

Per camera per day:

1 GB/hour x 24 hours = 24 GB/day per camera.

All cameras per day:

1,000 cameras x 24 GB = 24 TB/day.

30-day retention:

24 TB/day x 30 days = 720 TB raw storage.

With 2x replication:

720 TB x 2 = 1.44 PB (petabytes). That's 1,440 TB.

Hard drives needed:

1,440 TB / 10 TB per drive = 144 hard drives.

Hardware cost estimate:

At ~$200 per 10 TB enterprise drive: 144 x $200 = ~$28,800 for drives alone. Add servers, networking, RAID controllers, rack space, and power — the total system cost is likely 3-5x the drive cost, so $85,000-$145,000.

Write throughput matters here too. 1,000 cameras writing simultaneously means 1,000 x (1 GB / 3,600 sec) = ~278 MB/sec of continuous write ingress. A single HDD writes at ~150 MB/sec, so you need at minimum 2 drives dedicated to writes (more realistically 4-8 with overhead). This is a write-heavy, sequential workload — perfect for large spinning disks, terrible for random-access SSDs.

A ride-sharing app like Uber has 10 million DAU. Each user opens the app 3 times per day. During each session:

The app sends 1 GPS location update per second for 10 minutes
The user makes 1 ride request
The system queries 5 nearby driver locations

Your tasks:

How many location updates happen per day?
What's the average location-update QPS?
What's the peak QPS (rush hour is about 3x average)?
What kind of database or storage system can handle this?

Total location updates per day:

10M users x 3 sessions x 10 minutes x 60 seconds x 1 update/sec = 18 billion updates/day.

Average QPS:

18,000,000,000 / 86,400 = ~208,000 updates/sec. That is a LOT.

Peak QPS (rush hour 3x):

208,000 x 3 = ~624,000 updates/sec.

But wait — users aren't evenly spread across 24 hours. Most ride-sharing usage concentrates in morning (8-9 AM) and evening (5-7 PM) rush hours. During those 4 peak hours, traffic might be 3-5x the 24-hour average.

At 624K writes/sec, no single database handles this. You need time-series optimized storage — something like Apache Kafka for ingestion (millions of writes/sec), feeding into a time-series database (TimescaleDB, InfluxDB) or geospatial index (Redis with geohashing). Traditional SQL databases top out around 10K-30K writes/sec per node. You'd need 20-60 sharded MySQL nodes just for writes — or one Kafka cluster that handles it natively.

The estimation just determined your entire storage architecture.

Design the infrastructure for a Spotify-like music streaming service. Here are the numbers:

100 million DAU
Average listening session: 60 minutes per day
Average song: 3 minutes long, 3 MB (128 kbps encoding)
Music catalog: 80 million songs

Calculate all four dimensions:

Concurrent streams at peak — how many people are listening right now?
Total catalog storage — how much space for all the music?
Egress bandwidth at peak — how much data are you pushing out?
CDN requirements — can you afford to serve this from a cloud provider?

(a) Concurrent streams at peak:

100M users each listen 60 minutes/day. Total listening minutes: 100M x 60 = 6 billion minutes/day. Spread over 24 hours: 6B / 1,440 min = ~4.2M concurrent listeners on average. But music has peak hours (commute time, evenings) — assume 3x peak: roughly 12-15M concurrent streams. Let's use 15M for safety.

(b) Catalog storage:

80M songs x 3 MB each = 240 TB at one quality level. But streaming services offer multiple quality tiers — let's say 3 (low 64kbps, medium 128kbps, high 256kbps). That's roughly ~500 TB of music files. With 3x replication across data centers: ~1.5 PB.

(c) Egress bandwidth at peak:

15M concurrent streams x 128 kbps each = 1.92 Tbps. Rounding up for overhead (metadata, API calls, album art): roughly ~2.5-3 Tbps peak egress.

(d) CDN requirements:

At cloud egress prices ($0.02/GB), serving 3 Tbps is catastrophically expensive — around $640,000 per month. This is exactly WHY companies like Spotify, Netflix, and YouTube build their own CDN infrastructure (or use specialized CDN providers with volume discounts). At scale, you must own your edge network.

This exercise shows how estimation drives business decisions, not just technical ones. The bandwidth cost alone ($7.7M/year from cloud) versus building your own CDN (~$1M/year amortized) is a CEO-level decision that came from 60 seconds of multiplication. In an interview, walking through this reasoning — especially the cost comparison — shows you think like a senior engineer, not just a coder.

An e-commerce platform is preparing for a Black Friday flash sale. Here are the baseline and expected numbers:

Normal traffic: 50,000 QPS
Expected Black Friday spike: 20x normal = 1,000,000 QPS for roughly 2 hours
Average response size: 10 KB (product page with metadata)
Database capacity: single PostgreSQL instance handling 30,000 QPS max (indexed reads)
Redis cache: each node handles 200,000 operations/sec

Calculate:

How many Redis cache nodes do you need?
What cache hit rateThe percentage of requests that find data in the cache (hit) vs having to go to the database (miss). A 97% hit rate means only 3 out of 100 requests touch the database. is required to keep the database under 30K QPS?
What's the total bandwidth at 1M QPS?
What infrastructure do you need for the load balancer layer?

(a) Redis cache nodes:

1,000,000 QPS / 200,000 per Redis node = 5 nodes minimum. But you never run at 100% capacity — add a safety margin of 50-60%: 8 Redis nodes. (If one node dies during Black Friday, you still have capacity.)

(b) Required cache hit rate:

The database maxes out at 30,000 QPS. Total incoming: 1,000,000 QPS. The cache must absorb everything above 30K — that means 970,000 out of 1,000,000 requests must be cache hits.

Cache hit rate needed: 970,000 / 1,000,000 = 97% minimum.

Is 97% realistic? For a flash sale where everyone is viewing the same few products, absolutely — the 80/20 ruleIn most systems, 20% of the data handles 80% of the traffic. For a flash sale, it's even more extreme — maybe 5% of products get 95% of the views. This makes caching extremely effective. is even more extreme during sales (maybe 5% of products get 95% of views). Pre-warm the cache with popular items before the sale starts.

(c) Total bandwidth:

1,000,000 QPS x 10 KB = 10,000,000 KB/sec = 10 GB/sec = 80 Gbps.

(d) Load balancer layer:

A single HAProxyA widely-used open-source load balancer. A single instance can typically handle 1-2 million connections and 40-80 Gbps of throughput depending on hardware. instance handles ~40-80 Gbps. At 80 Gbps, you need 2-3 load balancers in an active-active setup. Cloud load balancers (AWS ALB) auto-scale but need pre-warming — tell AWS in advance about the Black Friday spike or it takes 10-15 minutes to scale up (by which time your sale page is already down).

Mentioning the pre-warming requirement (both for cache AND load balancers) shows real production experience. Many candidates estimate the numbers correctly but forget that auto-scaling isn't instant. Flash sales require pre-provisioned capacity — you spin up the extra Redis nodes and LB capacity 30 minutes before the sale starts, not when traffic arrives.

Six exercises from warm-up to interview-hard: (1) Messaging storage — images dominate text 75x, optimize images first. (2) API bandwidth — 23 QPS needs zero scaling, don't over-engineer. (3) Surveillance storage — 1.44 PB with replication, 144 drives, ~$29K in hardware. (4) Ride-sharing — 624K writes/sec peak needs time-series storage, not SQL. (5) Music streaming — 3 Tbps egress means you must own your CDN. (6) Black Friday — 97% cache hit rate required, pre-warm everything before the spike.

Section 14

Cheat Sheet — Estimation at a Glance

Keep these cards bookmarked. They're the numbers you'll reach for in every estimation — during interviews, capacity planning, or late-night "will this scale?" calculations. Each card is one fact you should know by heart.

2^10 ≈ 1 Thousand (KB). 2^20 ≈ 1 Million (MB). 2^30 ≈ 1 Billion (GB). 2^40 ≈ 1 Trillion (TB). Memorize these four — they convert between bytes and human-readable units instantly.

86,400 ≈ 10^5. This is the single most used number in estimation. To convert "per day" to "per second," just divide by 100,000. Quick, easy, always close enough.

500M requests/day ÷ 100,000 = 5,000 req/sec. Works for any daily number. Just drop five zeros (or divide by 10^5). This one shortcut handles 80% of QPS calculations.

Peak ≈ 2-3x average for most apps. Social media and e-commerce can spike 5-10x during events. Always design for peak, not average — average traffic doesn't crash systems.

20% of your data serves 80% of requests. This means caching just the hot 20% can absorb 80% of your traffic. For flash sales, it's more like 5/95.

1 byte = 8 bits. Network speeds are in bits (Mbps, Gbps). Storage is in bytes (MB, GB). So 1 Gbps = 125 MB/sec. Don't mix them up — it's an 8x error!

Single instance: ~5,000-10,000 QPS for indexed reads. Writes are slower: ~1,000-5,000 QPS. These numbers assume proper indexing — without indexes, divide by 100.

Single instance: ~100,000-200,000 ops/sec. In-memory, single-threaded (Redis 6 has I/O threads). 10-100x faster than disk-based databases. The go-to for caching and rate limiting.

~100-500 bytes including metadata (sender, timestamp, delivery receipt, encryption overhead). Pure text content is usually 50-200 bytes. Small but adds up at billions/day.

~200 KB - 2 MB for a compressed JPEG. Thumbnails: ~10-30 KB. High-res originals: 5-15 MB. Always store thumbnails separately — serving full images for a feed is a bandwidth disaster.

~10-20 MB at 720p compressed (H.264). 1080p: ~25-50 MB. 4K: ~100-300 MB. Videos are the heaviest common data type — they dominate storage and bandwidth in any system that hosts them.

Always multiply storage by replication factor. Standard: 3x for databases, 2x for backups, 1x for ephemeral data. Forgetting replication is the #1 underestimation mistake in interviews.

Twelve quick-reference cards covering the essential estimation numbers: powers of 2, seconds-in-a-day shortcut, peak traffic multipliers, 80/20 caching rule, bytes vs bits conversion, database QPS limits (MySQL ~10K, Redis ~200K), common data sizes (text ~200B, photo ~500KB, video ~15MB/min), and the critical replication factor reminder.

Section 15

Connected Topics — Where to Go Next

Estimation gives you the numbers. The topics below teach you what to do with those numbers. When your estimate says "500K writes/sec," scalability tells you how to handle it. When it says "97% cache hit rate required," the caching page shows you how to achieve it. Every estimation answer points to an architecture decision — and these pages cover those decisions in depth.

Estimation connects to every other system design topic. Scalability uses your QPS numbers. Caching implements the 80/20 rule. Sharding kicks in when storage exceeds one node. Load balancers distribute the traffic you calculated. CDNs handle the bandwidth. Each topic is a direct next step from an estimation answer.