TL;DR โ The Napkin That Saved a Million Dollars
- Powers of 2 and the five numbers that let you estimate almost anything
- The reference numbers every engineer carries in their head (QPS, storage sizes, latencies)
- A 5-step estimation framework you can use in any interview or design review
- Worked examples for Twitter, YouTube, chat systems, URL shorteners, and notification platforms at scale
Back-of-envelope estimation is the ability to quickly calculate whether a design will work at the required scale โ before writing a single line of code.
Here's an analogy everyone understands. You're planning a road trip. Before you get in the car, you do some quick math: "It's 500 miles. My car gets 30 miles per gallon. Gas is $3.50 a gallon. So I need about 17 gallons, which costs roughly $60." You didn't calculate it to the penny. You didn't pull out a spreadsheet. You did napkin math โ just enough to know whether the trip is affordable.
That's exactly what back-of-envelope estimation is in system design. Instead of miles and gallons, you estimate requests per secondHow many user actions (page loads, API calls, searches) hit your servers every second. Abbreviated QPS (queries per second). A busy website might handle 10,000-100,000 QPS., storage in terabytes, and bandwidthHow much data flows through your network per second, measured in Mbps or Gbps. Think of it as the width of a highway โ more lanes means more cars (data) can travel at once. in gigabits per second. The goal isn't perfection โ it's getting the right order of magnitudeThe power-of-10 "bucket" a number falls in. Is it 1GB or 1TB? That's a 1,000x difference. Getting the order of magnitude right (even if the exact number is off by 2-3x) is what matters in estimation.. Is the answer 1 gigabyte or 1 terabyte? That 1,000x difference changes your entire architecture.
Why does this matter? Three reasons, and they're all big.
In interviews, estimation shows you understand real-world scale. Anyone can say "use a database." Only someone who's done the math can say "we need ~18TB of storage per year, so we should plan for sharding from day one." That's the difference between a junior answer and a senior answer.
In production, estimation prevents disasters. Imagine deploying a system that needs 50TB of storage when you budgeted for 5TB. Or designing a single-server architecture for a feature that actually needs to handle 100,000 requests per second. These mistakes cost real money and real time โ and 30 seconds of napkin math could have caught them.
In design reviews, estimation is the fastest sanity checkA quick, rough calculation to make sure your design isn't wildly off. You're not looking for the exact answer โ you're checking that you're in the right ballpark. "Do we need 1 server or 1,000?" is a sanity check. you have. Before anyone writes a design document, before anyone provisions infrastructure, a 30-second calculation can tell you whether the approach is even in the right ballpark.
What: Back-of-envelope estimation is quick, rough math to figure out if a system design can handle the required scale. You calculate requests per second, storage needs, and bandwidth โ not to the exact byte, but to the right order of magnitude.
When: At the START of any system design โ in interviews, design reviews, or before provisioning infrastructure. Do the math first, build second.
Key Principle: You don't need exact numbers. You need to know if the answer is 1GB or 1TB. That 1,000x difference is what changes your architecture. If you're within 2-3x of the real answer, your estimation did its job.
The Scenario โ Why Interviewers Love This
Picture this. You're in a system design interview. The interviewer says: "Design a URL shortener like bit.ly." You feel confident. You start talking about database schemas, load balancers, maybe even mention consistent hashing. The interviewer listens politely for two minutes, then interrupts:
You freeze. You have no idea. You were so busy designing the how that you never figured out the how much. And without "how much," your entire design is floating in the air with no foundation. Is one database enough or do you need ten? Should you cache aggressively or is the load light enough to skip it? You can't answer any of these questions without numbers.
Now picture a different candidate in the same interview. Same question โ "Design a URL shortener." But this candidate starts differently:
That took 30 seconds. And it told the interviewer three things: this candidate understands scale, this candidate makes data-driven decisions, and this candidate won't accidentally design a system that falls over on day one. This candidate gets the offer.
Here's the key insight that changes everything about how you approach estimation:
Think about it this way. If you estimate 5,000 QPS and the real answer is 8,000 QPS, you're fine โ both numbers suggest the same architecture (a few servers with a cache layer). But if you estimate 5,000 QPS and the real answer is 5,000,000 QPS, you're going to build something that collapses on launch day. Estimation doesn't need to be precise. It needs to land in the right order of magnitude.
An interviewer asks you to design Twitter's home timeline. Before reading further, try to estimate: how many timeline reads per second does Twitter handle? Hint: Twitter has ~400 million monthly active users. If each active user opens the app twice a day, how many timeline reads per second is that?
400M users x 2 reads/day = 800M reads/day. Divide by 100,000 (seconds in a day) = ~8,000 reads/sec average. But peak is 2-3x average, so ~20,000 reads/sec at peak.The Foundation โ Powers of 2 and Quick Math Tricks
Before you can estimate anything, you need some numbers burned into your brain. Not hundreds of numbers โ just a handful. These are the "multiplication tables" of system design. Once they're automatic, estimation becomes as easy as mental arithmetic.
Let's start with the single most important reference table in all of system design: the powers of 2Because computers work in binary (base 2), all storage and memory sizes are powers of 2. A kilobyte is 2^10 bytes (1,024), a megabyte is 2^20 bytes (1,048,576), and so on. Knowing these lets you instantly convert between units.. Everything in computing โ memory, storage, network packets โ is measured in powers of 2. If you know five of them, you can estimate almost anything.
| Power | Exact Value | Approximation | Unit | Real-World Example |
|---|---|---|---|---|
210 |
1,024 | ~1 Thousand | 1 KB | A short email or a tiny JSON response |
220 |
1,048,576 | ~1 Million | 1 MB | A high-quality photo or a minute of MP3 audio |
230 |
~1.07 Billion | ~1 Billion | 1 GB | A feature-length movie (compressed) or 1,000 photos |
240 |
~1.1 Trillion | ~1 Trillion | 1 TB | A small library's worth of books, or ~500 hours of HD video |
250 |
~1.13 Quadrillion | ~1 Quadrillion | 1 PB | Netflix's entire content library, or ~500 million photos |
The pattern is beautiful in its simplicity: every 10 powers of 2, you jump by 1,000x. KB to MB is 210. MB to GB is another 210. GB to TB? Another 210. So if someone says "we have 500 million records at 2KB each," you instantly know: 500M x 2KB = 1 billion KB = 1 million MB = 1,000 GB = 1 TB. Three hops up the ladder. That calculation should take you two seconds.
Now for the second set of numbers you need: time conversions. These come up in literally every estimation because you almost always start with "X per day" and need to convert to "Y per second" (since servers think in seconds, not days).
There are exactly 86,400 seconds in a day (60 x 60 x 24). That's annoyingly specific. For napkin math, we round it: 1 day ≈ 100,000 seconds (or 105). It's only off by about 15%, which is nothing for estimation purposes. This single shortcut makes every daily-to-per-second conversion trivial.
A few more shortcuts that show up constantly:
| Fact | Exact Value | Napkin Approximation | When You Use It |
|---|---|---|---|
| Seconds in a day | 86,400 | ~105 (100,000) | "X per day" to "Y per second" |
| Seconds in a year | 31,536,000 | ~3 x 107 (30 million) | Yearly storage or data growth |
| Seconds in a month | ~2,592,000 | ~2.5 x 106 (2.5 million) | Monthly billing or capacity |
| 80/20 rule | Pareto principle | 80% of traffic hits 20% of data | Cache sizing โ you only need to cache the hot 20% |
| Peak vs average | Varies | Peak = 2-3x average | Capacity planning โ don't size for average, size for peak |
A social media app has 200 million daily active users. Each user makes an average of 5 API requests per session, and they open the app 3 times a day. How many requests per second does the system handle on average? What about at peak?
200M users x 5 requests x 3 sessions = 3 billion requests/day. Divide by 10^5 = 30,000 QPS average. Peak at 2-3x = 60,000-90,000 QPS.The Reference Numbers โ What a Single Server Can Do
Knowing how to do the math is only half the battle. You also need to know what to compare your numbers against. When your calculation says "I need 50,000 QPS," you need to instantly know: can one server handle that, or do I need fifty?
These are the benchmarksStandard performance measurements for common technologies. These aren't exact (they depend on hardware, configuration, query complexity, etc.), but they give you the right order of magnitude for estimation. that experienced engineers carry in their heads. They're not exact โ a well-tuned PostgreSQL on beefy hardware might do 50K QPS, while a poorly indexed one on a micro instance might do 500. But for estimation, you don't need exact. You need the right order of magnitude.
Compute Benchmarks (QPS โ Queries Per Second)
How much work can a single instance of each technology handle? These numbers assume reasonable hardware (4-8 cores, 16-32GB RAM) and typical workloads. The ranges reflect different query complexity โ simple key lookups are fast, complex joins are slow.
| Technology | Typical QPS | Why This Number? | Use in Estimation |
|---|---|---|---|
| Web server (Node.js/Go)A single application server handling HTTP requests. Go and Node.js are popular for high-throughput API servers because they handle concurrent connections efficiently using goroutines or event loops. | 10K-50K req/sec | Simple JSON responses are CPU-cheap; the bottleneck is usually the database behind it | "I need 30K QPS" = 1-3 app servers |
| MySQL / PostgreSQLThe two most popular relational databases. MySQL is widely used in web apps (Facebook, Uber). PostgreSQL is known for correctness and advanced features (Instagram, Stripe). Both have similar QPS characteristics. | 5K-10K QPS | Disk I/O is the bottleneck; indexed reads are fast, complex joins are slow; writes need disk flush | "I need 50K reads/sec" = ~5-10 read replicas |
| RedisAn in-memory key-value store used for caching, session storage, and real-time data. Because everything lives in RAM (no disk reads), it's incredibly fast. The tradeoff: your data must fit in memory, and RAM is expensive. | 100K-200K ops/sec | Everything is in RAM โ no disk reads. Simple GET/SET operations. Single-threaded but event-loop-based | "I need to cache 500K reads/sec" = 3-5 Redis nodes |
| Kafka brokerApache Kafka is a distributed message queue used for event streaming. A "broker" is one Kafka server. It's optimized for sequential writes to disk (append-only logs), which is why it can handle such high throughput. | 100K-500K msg/sec | Sequential disk writes (append-only) are fast; consumer reads are also sequential; batching helps hugely | "I need 1M events/sec" = 2-10 Kafka brokers |
| ElasticsearchA search engine built on Apache Lucene. Used for full-text search, log analytics, and real-time data exploration. Slower per query than a database because it's doing complex text matching and relevance scoring. | 1K-5K queries/sec | Full-text search with relevance scoring is CPU-intensive; complex aggregations are slower | "I need search across 1B docs" = cluster with multiple nodes |
Storage Benchmarks (How Big Is Everything?)
When you're estimating storage, you need to know how big typical pieces of data are. These sizes include reasonable metadata (timestamps, user IDs, etc.) โ not just the raw content.
| Data Type | Typical Size | Why This Size? | Estimation Example |
|---|---|---|---|
| A text tweet / short message | ~1 KB | 280 chars of UTF-8 text (~560 bytes) + user ID, timestamp, retweet count, metadata | 500M tweets/day x 1KB = 500GB/day = ~180TB/year |
| A JSON API response | 1-10 KB | Typical REST response with a few fields; larger if it includes nested objects or lists | 10K QPS x 5KB average = 50MB/sec bandwidth |
| A compressed photo (JPEG) | 200KB - 2MB | Phone photos are 3-5MB raw; JPEG compression gets them to 200KB-2MB depending on quality | 50M photos/day x 1MB = 50TB/day (Instagram-scale) |
| 1 minute of video (720p) | 10-20 MB | Compressed H.264 at 720p is roughly 2-3 Mbps bitrate = ~15MB per minute | 500 hours uploaded/min (YouTube) x 15MB/min = 7.5GB/min new content |
| 1 hour of video (1080p) | 1-3 GB | 1080p at 5-8 Mbps bitrate; Netflix encodes at multiple quality levels (adaptive bitrate) | Streaming to 10M users at 5Mbps = 50 Tbps total bandwidth |
Network Benchmarks (How Fast Does Data Travel?)
Network speed affects how you design for latencyThe time it takes for a request to travel from the user to the server and back. Measured in milliseconds. A user in New York hitting a server in Virginia has ~10ms latency. Hitting a server in London? ~80ms. This is governed by the speed of light in fiber optic cable. (how fast users get a response) and throughputHow much total data you can push through the network per second. A 1 Gbps link can transfer about 125 MB/sec. If your system needs to serve 500 MB/sec of video, you need at least 4 Gbps of network capacity. (how much data you can move). The speed of light is a hard constraint โ no amount of engineering can make New York to London faster than ~80ms round trip.
| Network Path | Typical Latency | Bandwidth Equivalent | Design Implication |
|---|---|---|---|
| Same datacenter (rack-to-rack) | ~0.5 ms | 10-25 Gbps links | Internal service calls are nearly free; design for many small calls |
| Same region (e.g., us-east-1a to 1b) | ~1-2 ms | Up to 25 Gbps | Cross-AZ replication is fast enough for sync writes |
| Cross-region (US East to West) | ~40 ms | 1-10 Gbps | Too slow for sync calls; use async replication |
| Cross-continent (US to Europe) | ~80 ms | Varies | Need CDN or regional replicas for good user experience |
| US to Asia-Pacific | ~150-200 ms | Varies | Multi-region deployment is essential; cache aggressively at the edge |
One conversion that trips people up: 1 Gbps = 125 MB/sec. That's because 1 byte = 8 bits, so you divide by 8. If your server has a 1 Gbps network link and needs to serve 500 MB/sec of video, you need at least 4 links (or a 10 Gbps connection). This is a common gotcha in bandwidth estimation.
Let's put these reference numbers to work with a quick example. Say you're designing a chat application and you estimate 10 million messages per day. Each message is about 1KB. How much storage per year? And can a single database handle the write load?
In 30 seconds, you went from "design a chat app" to "4TB/year, one DB handles writes, might need read replicas." That's the power of having reference numbers in your head. You didn't need a spreadsheet. You didn't need a calculator. You needed five numbers and basic arithmetic.
YouTube says 500 hours of video are uploaded every minute. Using the reference numbers above, estimate: (1) How much new storage per day? (2) How many Gbps of ingest bandwidth? Try it before reading on.
500 hours/min = 30,000 hours/hour = 720,000 hours/day. At ~2GB per hour (1080p), that's ~1.44 PB/day of new content. For bandwidth: 500 hours/min = 30,000 min of video/min. At 15MB per minute of video, that's 450GB/min = 7.5GB/sec = 60 Gbps ingest bandwidth.The 5-Step Framework โ How to Estimate Anything
Every estimation you'll ever do in an interview or at work follows the same five steps. It doesn't matter whether you're estimating Twitter, YouTube, or a tiny startup app โ the process is identical. Learn these five steps once, and you can estimate any system on the planet in under two minutes.
Think of it like a recipe. You wouldn't bake a cake by throwing random ingredients into an oven and hoping for the best. You follow steps: measure flour, add eggs, mix, bake. Estimation is the same โ a fixed sequence of multiplications that always produces a useful answer.
Quick Example: Running All 5 Steps
Let's run through the entire framework with a simple app: 100K DAU, 10 actions per user per day, 2 KB per action. This could be a notes app, a to-do list, or a simple social feed. Watch how the five steps chain together โ each one feeds the next.
See how fast that was? Five multiplications, and now you know exactly what infrastructure this app needs: one server, one database, done. No Kubernetes. No Redis. No microservices. The numbers told you the architecture should be dead simple โ and that insight alone saves months of over-engineering.
Worked Example 1 โ Twitter: How Many Tweets Per Second?
Twitter (now X) is the classic example of a QPS-heavy estimation. The core question is: how many tweets are being written per second, and how many are being read per second? The gap between those two numbers reveals the entire architecture.
Before we touch any math, let's pin down our assumptions. In an interview, you'd say these out loud โ the interviewer wants to hear you reason about what's realistic.
- 400 million DAU โ Twitter-scale active users
- Average user tweets 2x per day โ most users tweet rarely; power users tweet 20+, so 2 is a reasonable average
- Average user reads their timeline 20x per day โ scrolling through the feed, checking notifications
- Each tweet: ~140 chars of text + metadata (timestamps, user ID, etc.) ≈ 1 KB. With media links and preview data: ~5 KB average
The Full Calculation Chain
What the Numbers Tell You About Architecture
The 10:1 read-to-write ratio is the single most important number from this estimation. It tells you that reads dominate everything. That one insight drives all of these architectural decisions:
- Caching is mandatory โ with 93K reads/sec, you can't hit the database for every timeline request. A RedisAn in-memory data store used as a cache. Data lives in RAM instead of on disk, so reads are microseconds instead of milliseconds. Perfect for hot data that gets read millions of times. caching layer absorbs 95%+ of reads.
- Read replicas โ even with caching, you need multiple database copies to handle cache misses. A primary database handles writes; read replicas handle the overflow.
- Fanout-on-write vs. fanout-on-read โ when a user tweets, do you pre-compute every follower's timeline (write amplification) or compute it when each follower opens the app (read amplification)? At 10:1 read:write, pre-computing timelines at write time saves 10x the read-time work.
- CDN for media โ at 3.7 Gbps egress, you absolutely need a CDNContent Delivery Network. A global network of servers that cache and serve content from locations close to users, reducing latency and offloading traffic from your origin servers. to distribute the load geographically.
Worked Example 2 โ YouTube: How Much Storage Per Day?
YouTube is the storage and bandwidth monster of the internet. While Twitter deals mostly with tiny text blobs, YouTube deals with massive video files. The numbers here get staggering fast โ and that's exactly why this is a great estimation exercise. It forces you to think about what happens when data sizes go from kilobytes to gigabytes.
- 2 billion MAU, ~800 million DAU
- 500 hours of video uploaded per minute โ this is YouTube's actual published stat
- Average video: 1080p raw ≈ 2.5 GB/hour. After encoding/compression: ~500 MB/hour
- Each video stored in 5 resolutions (1080p, 720p, 480p, 360p, 240p) โ total is roughly 1.5x the 1080p encoded size
- Average user watches 40 minutes/day
Storage: The Upload Side
Let's start with what goes IN โ how much raw video is YouTube receiving, and how much space does it take after processing?
Bandwidth: The Download Side
Storage is expensive, but egress bandwidth is where the real money goes. Every time someone watches a video, YouTube has to push that data to their device. With 800 million people watching 40 minutes a day, the numbers get wild.
The lesson from YouTube's estimation is clear: media changes everything. Text-based systems like Twitter need petabytes per year. Video-based systems like YouTube need petabytes per day. When you see "video" or "images" in a system design question, immediately multiply all your storage and bandwidth estimates by 100-1000x compared to text-only systems.
Worked Example 3 โ URL Shortener: The Interview Classic
If there's one estimation that every system design candidate should have memorized, it's this one. The URL shortener (think bit.ly or tinyurl.com) is the single most common estimation question in interviews. Why? Because it's simple enough to do in 3-4 minutes, it covers all five estimation steps, and the numbers lead to interesting architecture decisions.
A URL shortener does one thing: you give it a long URL, and it gives you a short one. When someone clicks the short URL, it redirects them to the original long URL. That's it. But the scale of doing this billions of times reveals some surprising math.
- 100 million new URLs shortened per day โ bit.ly scale
- Read:write ratio of 100:1 โ URLs are created once but clicked many, many times
- Each URL mapping: short URL (7 chars) + long URL (avg 200 chars) + metadata (timestamps, user ID, click count) ≈ 500 bytes
- Retention: 5 years โ short URLs should keep working for at least 5 years
Traffic
Storage
Can We Generate Enough Short Codes?
Here's where the estimation gets interesting. We need 182.5 billion unique short URLs over 5 years. Each short URL is 7 characters long, using Base62An encoding using 62 characters: lowercase a-z (26), uppercase A-Z (26), and digits 0-9 (10). Used for URL-safe short codes because all 62 characters are valid in URLs without encoding. encoding (a-z, A-Z, 0-9). How many possible codes is that?
Caching
At 116,000 reads per second, can we reduce the database load with a cache? Absolutely. The 80/20 ruleAlso called the Pareto principle. Roughly 80% of effects come from 20% of causes. For URLs: 80% of clicks go to just 20% of URLs. (Pareto principle) tells us that about 20% of URLs receive 80% of the traffic. If we cache just the popular ones:
Worked Example 4 โ WhatsApp: Messages at Global Scale
WhatsApp is a mixed estimation โ it combines QPS, storage, AND persistent connections. It's also the system where media vs. text reveals the most surprising insight: the text messages that everyone thinks of as "WhatsApp's data" are actually a rounding error compared to the photos and videos people send.
- 2 billion registered users, ~500 million DAU
- Average user sends 40 messages/day
- Average text message: 100 bytes of text + metadata (timestamps, read receipts, encryption overhead) ≈ 200 bytes total
- 10% of messages include media: photos average 200 KB, videos average 5 MB (WhatsApp compresses both aggressively)
- Messages stored on server for 30 days (end-to-end encrypted, kept until delivered to all recipients)
Message Traffic
Storage: Text vs. Media (The Surprise)
Here's where the estimation gets really interesting. Most people guess that text messages are WhatsApp's storage challenge. They're wrong โ by a factor of a thousand. Let's do the math.
Persistent Connections
WhatsApp is a real-time messaging app. That means users don't poll the server asking "any new messages?" every few seconds โ instead, each device maintains a persistent connectionA long-lived network connection (like a WebSocket) that stays open between the client and server. Instead of connecting fresh for every message, the phone keeps one connection open and receives messages instantly through it. to the server. Messages arrive instantly through this always-open channel. But maintaining millions of simultaneous connections requires its own infrastructure.
WhatsApp's estimation also reveals something about architecture: this is really two separate systems. The text/metadata system is a classic database problem (4 TB/day, 231K QPS โ sharded database with replication). The media system is an object storage problem (5.2 PB/day โ S3-like blob storage with CDN delivery). Most messaging apps split these into entirely different storage backends because the requirements are so different.
Worked Example 5 โ Notification System: Push at Scale
Every app sends push notifications โ order updates, social activity, flash-sale alerts. It feels simple: fire off a message to a phone. But when 200 million users each get 5 pushes a day, you're suddenly dealing with a billion outbound API calls per day and the single hardest problem in notification systems: fan-outFan-out is when one event triggers many downstream actions. A single "flash sale starts" event might generate 200 million individual push notifications โ that's extreme fan-out..
This example is different from the previous four because the bottleneck isn't storage or read QPS โ it's outbound throughput to external APIs. You're at the mercy of Apple's APNs, Google's FCM, and browser Web Push servers, each with their own rate limits and latency profiles.
Assumptions
| Parameter | Value | Why This Number |
|---|---|---|
| DAU | 200 million | Large e-commerce or social platform (think Shopify-scale or mid-tier social app) |
| Pushes per user per day | 5 | Order updates, promotions, social activity, reminders, system alerts |
| Notification payload | ~1 KB | Title, body, deep link URL, metadata, device token โ JSON-encoded |
| Device split | 60% Android, 35% iOS, 5% Web | Global average skews Android-heavy; each platform = separate API call |
| Push providers | 3 (FCM, APNs, Web Push) | Each device gets a dedicated API call to its platform's push service |
| Peak multiplier | 3ร | Notifications bunch around morning (9 AM), lunch (12 PM), evening (7 PM) |
| History retention | 90 days | Users scroll through past notifications in the app's notification center |
The Math
The Killer Scenario: Broadcast Fan-Out
The numbers above assume normal traffic โ individual notifications for individual users. But what happens when marketing decides to send a "Flash Sale Starts NOW!" push to all 200 million users at once?
That single event becomes 200 million API calls to external push providers. Let's do the math on how long that takes depending on how many worker machines you throw at it:
| Workers | Calls/sec per worker | Total throughput | Time to deliver |
|---|---|---|---|
| 1 | 10,000 | 10K/sec | 20,000 sec = 5.5 hours |
| 10 | 10,000 | 100K/sec | 2,000 sec = 33 minutes |
| 100 | 10,000 | 1M/sec | 200 sec = 3.3 minutes |
| 1,000 | 10,000 | 10M/sec | 20 sec |
With a single worker doing 10,000 API calls per second, it takes 5.5 hours to reach everyone. A "flash sale" notification arriving 5 hours late is worse than useless โ it's embarrassing. Even 100 workers only get you down to 3 minutes. For truly instant broadcast, you need ~1,000 workers spun up temporarily โ which is why notification systems use auto-scaling worker poolsA group of worker machines that automatically grows and shrinks based on queue depth. When a broadcast event floods the queue with 200M messages, the pool scales from 50 workers to 1,000+ within minutes, then scales back down when the queue is drained. behind a message queue.
Architecture Implications
The numbers tell us exactly what architecture to build:
- Message queue (Kafka or SQS) โ You can't call 35,000 external APIs per second directly from your app servers. You need a buffer. The queue absorbs bursts (especially broadcast events) and lets workers drain at a sustainable rate.
- Separate queues per provider โ APNs, FCM, and Web Push have different rate limits, different payload formats, and different retry behaviors. Separate queues let you tune each independently.
- Auto-scaling worker pool โ Normal traffic needs ~50 workers. A broadcast event needs 1,000+. Static provisioning either wastes money (1,000 idle workers 99% of the time) or fails under load (50 workers trying to send 200M notifications). Auto-scaling solves both.
- Rate limiting per provider โ Apple throttles APNs if you exceed their limits. Google throttles FCM. Your workers need token-bucket rate limiters per provider to avoid getting temporarily blocked.
- Delivery tracking as a separate pipeline โ Don't block the send path waiting for delivery receipts. Fire-and-forget the notification, then process delivery callbacks asynchronously.
Common Mistakes โ Estimation Traps Everyone Falls Into
Even after you learn the 5-step method, there are traps that can silently destroy your estimates. Each one seems small โ a forgotten multiplier here, a unit confusion there โ but they compound. A 2ร error in assumptions times an 8ร unit confusion times a 3ร missing replication factor = a 48ร error in your final answer. That's the difference between "we need 10 servers" and "we need 480 servers." Let's walk through the seven traps and how to dodge each one.
The trap: You spend 2 minutes calculating that you need exactly 47,392 QPS. You could have said "about 50K" in 10 seconds and been just as right. In estimation, precision past one significant figure is a waste of time โ and worse, it creates a false sense of confidence. "47,392" sounds like you know the answer. You don't. Your input assumptions are guesses, so your output is a guess too.
Why it hurts: In interviews, you have 60 seconds for estimation. Spending 90 of them on long division means you never get to the architecture conclusions โ which is the part the interviewer actually cares about.
The fix: Round everything aggressively. 86,400 seconds/day โ "about 100K." 1,000,000,000 รท 100,000 โ "about 10,000." If rounding changes your answer by less than 2ร, the rounding doesn't matter. Save your brainpower for the assumptions, not the arithmetic.
The trap: You calculate average QPS as 10,000 and provision for 10,000. Every evening at 8 PM, traffic hits 30,000 and your system falls over. Users see errors. Pages time out. Your boss asks why you didn't plan for this.
Why it hurts: Average traffic is a mathematical fiction. Nobody experiences "average" โ they experience the actual moment they're using the system, which is usually during peak hours. If you size for average, you're guaranteeing failure during the hours that matter most.
The fix: Always calculate BOTH average AND peak. Peak is typically 2-3ร average for social apps, 5ร for e-commerce, and 10-100ร for flash sales or live events. State it explicitly: "Average is 10K QPS, peak is 30K, so I'll provision for 40K with headroom." The interviewer wants to hear the word "peak."
The trap: "A tweet is 280 characters = 280 bytes." Wrong. A tweet stored in a database includes: user ID (8 bytes), tweet ID (8 bytes), timestamp (8 bytes), retweet count (4 bytes), like count (4 bytes), reply-to ID (8 bytes), language code (2 bytes), geo coordinates (16 bytes), plus JSON encoding overhead, database indexes, and row metadata. The real size is 1-5 KB โ 4 to 18 times bigger than the raw text.
Why it hurts: This error is multiplicative. If you estimate 500 million tweets per day at 280 bytes, you get 140 GB. At 3 KB (more realistic), you get 1.5 TB. That's a 10ร error that changes whether you need one database server or ten.
The fix: Always ask "what else gets stored alongside this data?" For any row, add: IDs (8 bytes each), timestamps (8 bytes each), counters (4 bytes each), indexes (typically 2-3ร the row size), and encoding overhead (~20%). A safe rule of thumb: multiply your naive estimate by 5-10ร to account for everything you're forgetting.
The trap: "I need 10 TB of storage." You stop there. But every production database runs with at least 3 replicas (one primary, two replicas for failover and read scaling). That's 30 TB, not 10 TB. Add backups (another copy), and you're at 40 TB. Add cross-region replication for disaster recovery, and you're at 60 TB.
Why it hurts: A 3-6ร underestimate on storage directly translates to 3-6ร underestimate on cost. Your capacity plan says "$50K/year" but the real bill is "$300K/year." That's a career-damaging surprise.
The fix: Always multiply raw storage by a replication factor. A reasonable default: raw ร 3 for replicas, then ร 1.3 for indexes and metadata, then ร 1.5 for backups. Quick version: raw ร 5 gets you in the right ballpark for total storage cost.
The trap: You silently use "100 million DAU" in your head, do all the math, and present the final number. The interviewer has no idea where that number came from. Did you guess? Did you know? Are you making it up? Without stated assumptions, your estimate looks like a wild guess โ even if the math is perfect.
Why it hurts: In interviews, the process matters more than the answer. An interviewer who can see your assumptions can evaluate your reasoning, suggest adjustments, and have a productive conversation. An interviewer who just sees a final number can only say "that seems high" or "that seems low" โ and you've wasted the most valuable part of the exercise.
The fix: Say your assumptions out loud before doing any math. "I'll assume 100 million DAU, 50 messages per user per day, and about 100 bytes per message. Sound reasonable?" This takes 5 seconds and completely transforms the impression you make. It also gives the interviewer a chance to correct you ("Actually, let's assume 10 million DAU for this problem") โ which is a gift, not a failure.
The trap: Network bandwidth is measured in bits per second (Mbps, Gbps). Storage is measured in bytes (MB, GB, TB). There are 8 bits in a byte. So "1 Gbps bandwidth" = 125 MB/sec, NOT 1,000 MB/sec. That's an 8ร error from a single letter.
Why it hurts: Imagine you calculate that your system needs to transfer 500 MB/sec. You provision a "500 Mbps" network link thinking it's enough. It's actually only 62.5 MB/sec โ you're 8ร short. Your system chokes on bandwidth that's 12.5% of what you need.
The fix: Capital B = Bytes. Lowercase b = bits. Always check which one you're using. For quick conversion: divide bits by 8 to get bytes. 1 Gbps = 125 MB/sec. 10 Gbps = 1.25 GB/sec. Write the units explicitly in every step of your calculation. If an interviewer says "Gbps" and you're working in bytes, convert immediately and say it out loud.
The trap: "We have 100 TB of data, so we need a 100 TB cache." No โ you need to cache the hot data, not everything. In almost every system, 20% of the data handles 80% of the traffic. Your top 1,000 products get 50% of all views. Your top 10,000 users generate 40% of all content. Caching the hot 20% gives you 80% of the benefit at 20% of the cost.
Why it hurts: Over-estimating cache size means over-spending on RAM (which is 10-50ร more expensive per GB than disk). Under-estimating it means your cache miss rate is too high and your database still gets hammered. Both are expensive mistakes.
The fix: For cache sizing, use the 80/20 rule: cache 20% of your dataset as a starting point. 100 TB dataset โ 20 TB in cache. Then check: can you afford that much Redis? If not, cache 5-10% and accept more cache misses. State the trade-off: "20 TB of Redis at $25/GB/month = $500K/month. Or 5 TB at $125K/month with 60% cache hit rate instead of 80%."
Interview Playbook โ Nail Estimation Questions
You know the math now. But knowing math and performing it live โ out loud, under pressure, with someone evaluating every word โ are completely different skills. This section is your playbook: the exact script to follow, common questions interviewers ask, and the mental model of what the interviewer is actually scoring you on.
What the Interviewer Is Actually Scoring
Here's what most candidates don't realize: the interviewer does not care about your final number. They know it's an estimate. They know it'll be wrong. What they're scoring is your thinking process. Specifically, four things:
The 5-Step Script for Any Estimation Question
Memorize this sequence. It works for every estimation question, whether they ask about storage, bandwidth, servers, or cost. Each step takes about 10-15 seconds โ the whole thing fits in 60 seconds.
Step 1: "Let me start with the scale"
State your DAU assumption and the read/write ratio. This grounds the entire discussion.
Say out loud: "Let's assume this platform has about 100 million DAU. For a messaging app, I'd estimate a 10:1 read-to-write ratio โ people read more messages than they send."
Why this matters: The interviewer now knows your scale. If they want a different number, they'll tell you. If not, you've anchored the conversation with a reasonable assumption.
Step 2: "Let me estimate traffic"
Calculate QPS from your DAU and per-user activity. Always mention peak.
Say out loud: "If each user sends 40 messages per day, that's 4 billion messages daily. Divided by 86,400 โ roughly 100K โ gives about 46,000 writes per second. At peak, let's say 3ร that, so about 140K writes per second."
Step 3: "Let me estimate storage"
Size per item ร daily volume ร retention period. Mention replication.
Say out loud: "Each message is about 200 bytes for text, plus metadata โ let's call it 500 bytes. 4 billion messages ร 500 bytes = 2 TB per day. Over 5 years with 3ร replication, that's about 11 PB."
Step 4: "Let me estimate bandwidth"
QPS ร payload size. Separate ingress (writes) and egress (reads).
Say out loud: "At peak, 140K writes/sec ร 500 bytes = 70 MB/sec ingress. Reads are 10ร that, so 700 MB/sec egress, which is about 5.6 Gbps."
Step 5: "Let me sanity-check"
Compare your numbers to a real system. Draw the architectural conclusion.
Say out loud: "140K writes per second is in the WhatsApp range โ they reportedly handle about 60 billion messages per day, which is much higher. So 46K writes/sec feels reasonable for 100M DAU. The 11 PB storage over 5 years means we definitely need sharding โ no single database holds that. I'd use consistent hashing to shard by user ID."
Practice Question: Estimate Infrastructure for a Ride-Sharing App
Here's how a strong candidate thinks through this question in real time. Notice the structure: assumptions first, math second, implications third.
Step 1 โ Scale: "Uber has about 100 million MAU. DAU is maybe 30% for ride-sharing โ people don't ride every day. So 30 million DAU. Plus about 5 million active drivers at any moment. The read:write ratio is interesting here โ it's closer to 1:1 because drivers constantly send GPS pings (writes) and riders constantly fetch nearby drivers (reads)."
Step 2 โ Traffic: "The big number is GPS pings: 5 million drivers ร 1 ping every 4 seconds = 1.25 million writes per second. That dwarfs everything else. Ride requests: 30M riders ร 2 rides/day = 60M/day รท 86,400 โ 700/sec. So GPS writes dominate at 1.25M/sec."
Step 3 โ Storage: "GPS pings: each is ~100 bytes (lat, lng, timestamp, driver ID). 1.25M/sec ร 100B ร 86,400 sec = about 10 TB/day. But we don't need to keep all GPS data forever โ maybe 30 days for analytics, then aggregate. Trip records: 60M/day ร 2 KB = 120 GB/day โ small by comparison. Total: ~10 TB/day for 30 days = 300 TB live GPS data."
Step 4 โ Bandwidth: "GPS ingress: 1.25M/sec ร 100B = 125 MB/sec โ 1 Gbps. Map tile egress for riders viewing nearby drivers: 30M riders ร 10 map refreshes ร 5 KB = 1.5 TB/day. Peak egress around 500 MB/sec โ 4 Gbps."
Step 5 โ Sanity check and implications: "1.25 million GPS writes per second is enormous. PostgreSQL maxes out at maybe 50K writes/sec. So we can't use a traditional RDBMS for location data โ we need something like Redis with geospatial indexes or a time-series database. The 300 TB of location data suggests we need to aggressively TTL (expire) old pings. The trip records at 120 GB/day are manageable in a standard sharded database."
Common Interview Estimation Questions
Key insight: Photos dominate everything. Text metadata is rounding error.
Quick math: Instagram gets ~100 million photos uploaded per day. Average photo after compression: ~2 MB. So that's 100M ร 2 MB = 200 TB/day of new photos. With 3ร replication across data centers, that's 600 TB/day. Per year: ~220 PB. Five years: over 1 exabyte. This is why Instagram uses a custom storage system โ no off-the-shelf database can do this.
Don't forget: Multiple resolutions (thumbnail, medium, full) means 3-4ร more storage. Plus videos (growing fast, 10-50 MB each). Photos alone drive the number to petabytes per year.
Key insight: It depends entirely on whether connections are HTTP (short-lived)Standard HTTP connections: client sends request, server responds, connection closes. Each connection is brief (milliseconds to seconds). A single server can handle thousands of these per second because they don't stick around. or WebSocket (persistent)WebSocket connections stay open for the entire user session โ minutes or hours. The server has to maintain state for each one simultaneously. This is much more memory-intensive than HTTP..
HTTP (stateless): A modern server handles ~50,000 requests/second. If each request takes 20ms, only ~1,000 are truly concurrent at any moment. So 1M concurrent HTTP requests รท 50K/server โ 20 servers.
WebSocket (persistent): Each open connection consumes ~10-50 KB of memory for the socket buffer and connection state. A server with 32 GB RAM can hold ~500K-1M connections if the workload is light. So 1M WebSocket connections โ 2-4 servers for connections alone, but you need more for the actual message processing.
The answer interviewers want: "It depends โ let me ask whether these are long-lived or short-lived connections." That clarifying question alone is worth more than any number.
Key insight: Netflix is a bandwidth monster. Video at scale consumes more internet bandwidth than almost anything else.
Quick math: Netflix has ~250 million subscribers. At peak (evening hours), roughly 10-15% stream simultaneously โ call it 30 million concurrent streams. Average bitrate: ~5 Mbps (mix of SD, HD, 4K). So: 30M ร 5 Mbps = 150 Tbps (terabits per second). Netflix reportedly accounts for ~15% of all global downstream internet traffic.
In bytes: 150 Tbps รท 8 = ~18.75 TB/sec of video data flowing from Netflix CDN edges to users. That's why Netflix operates Open Connect โ their own global CDN with boxes installed directly inside ISP networks.
Practice Exercises โ Build Your Estimation Muscle
Reading about estimation is like reading about push-ups โ it doesn't make you stronger. The only way to get good at this is to do the math yourself. Set a 5-minute timer for each exercise. Don't peek at the hints until you've tried your own approach. The goal isn't to match the exact answer โ it's to be within the right order of magnitudeWithin 10x of the real answer. If the answer is 5TB and you got 2TB or 15TB, you're fine. If you got 50GB or 500TB, something went wrong in your reasoning..
These exercises get progressively harder. Exercises 1-2 are warm-ups that practice the basic "users x activity x size" formula. Exercises 3-4 add complexity with replication, peak traffic, and write-heavy workloads. Exercises 5-6 are full system estimations that combine multiple dimensions โ the kind you'll face in real interviews.
A messaging app has 50 million DAUDaily Active Users โ the number of unique users who open the app at least once per day. This is the most common starting point for any estimation.. On average, each user sends 30 messages per day. Each text message is about 200 bytes (including metadata like timestamp, sender ID, and delivery status). 5% of messages include a photo attachment averaging 300 KB each.
Your tasks:
- How much total storage does this app consume per day?
- How much storage per year?
- Which component dominates โ text or images? By how much?
Step 1 โ Text messages:
50M users x 30 messages/day x 200 bytes = 300,000,000,000 bytes = 300 GB/day of text.
Step 2 โ Image attachments:
Total messages per day: 50M x 30 = 1.5 billion messages. 5% have images: 1.5B x 0.05 = 75 million images/day. Storage: 75M x 300 KB = 22,500,000,000 KB = ~22.5 TB/day of images.
Step 3 โ Total:
Text (300 GB) + Images (22.5 TB) = ~22.8 TB/day.
Step 4 โ Per year:
22.8 TB/day x 365 = ~8.3 PB/year.
Your API serves 2 million requests per day. The average response payload is 5 KB. Assume traffic is roughly uniform across the day (no major spikes).
Your tasks:
- What's your average QPSQueries Per Second โ the number of requests your system handles each second. The most fundamental throughput metric in system design.?
- What's the peak QPS if spikes hit 3x average?
- What's your daily egress bandwidthEgress = outgoing data from your servers to clients. This is what cloud providers charge you for. Ingress (incoming data) is usually free.?
- Do you need to worry about scaling?
Average QPS:
2,000,000 requests / 86,400 seconds = ~23 QPS.
Peak QPS (3x):
23 x 3 = ~70 QPS. That's nothing for a modern server.
Egress bandwidth:
23 requests/sec x 5 KB/response = 115 KB/sec = ~0.92 Mbps.
Daily egress total:
2M requests x 5 KB = 10 GB/day. On AWS at $0.09/GB, that's about $0.90/day in bandwidth costs.
You're designing storage for a video surveillance system. The requirements:
- 1,000 cameras recording 24 hours a day, 7 days a week
- Each camera records at 720p, producing 1 GB/hour of compressed H.264 video
- 30-day retention โ footage older than 30 days is automatically deleted
- Data must be stored with 2x replication (one backup copy) for durability
Your tasks:
- Total raw storage needed (before replication)?
- Total storage with replication?
- How many 10 TB hard drives do you need?
- What's the approximate hardware cost just for drives?
Per camera per day:
1 GB/hour x 24 hours = 24 GB/day per camera.
All cameras per day:
1,000 cameras x 24 GB = 24 TB/day.
30-day retention:
24 TB/day x 30 days = 720 TB raw storage.
With 2x replication:
720 TB x 2 = 1.44 PB (petabytes). That's 1,440 TB.
Hard drives needed:
1,440 TB / 10 TB per drive = 144 hard drives.
Hardware cost estimate:
At ~$200 per 10 TB enterprise drive: 144 x $200 = ~$28,800 for drives alone. Add servers, networking, RAID controllers, rack space, and power โ the total system cost is likely 3-5x the drive cost, so $85,000-$145,000.
A ride-sharing app like Uber has 10 million DAU. Each user opens the app 3 times per day. During each session:
- The app sends 1 GPS location update per second for 10 minutes
- The user makes 1 ride request
- The system queries 5 nearby driver locations
Your tasks:
- How many location updates happen per day?
- What's the average location-update QPS?
- What's the peak QPS (rush hour is about 3x average)?
- What kind of database or storage system can handle this?
Total location updates per day:
10M users x 3 sessions x 10 minutes x 60 seconds x 1 update/sec = 18 billion updates/day.
Average QPS:
18,000,000,000 / 86,400 = ~208,000 updates/sec. That is a LOT.
Peak QPS (rush hour 3x):
208,000 x 3 = ~624,000 updates/sec.
But wait โ users aren't evenly spread across 24 hours. Most ride-sharing usage concentrates in morning (8-9 AM) and evening (5-7 PM) rush hours. During those 4 peak hours, traffic might be 3-5x the 24-hour average.
At 624K writes/sec, no single database handles this. You need time-series optimized storage โ something like Apache Kafka for ingestion (millions of writes/sec), feeding into a time-series database (TimescaleDB, InfluxDB) or geospatial index (Redis with geohashing). Traditional SQL databases top out around 10K-30K writes/sec per node. You'd need 20-60 sharded MySQL nodes just for writes โ or one Kafka cluster that handles it natively.
The estimation just determined your entire storage architecture.
Design the infrastructure for a Spotify-like music streaming service. Here are the numbers:
- 100 million DAU
- Average listening session: 60 minutes per day
- Average song: 3 minutes long, 3 MB (128 kbps encoding)
- Music catalog: 80 million songs
Calculate all four dimensions:
- Concurrent streams at peak โ how many people are listening right now?
- Total catalog storage โ how much space for all the music?
- Egress bandwidth at peak โ how much data are you pushing out?
- CDN requirements โ can you afford to serve this from a cloud provider?
(a) Concurrent streams at peak:
100M users each listen 60 minutes/day. Total listening minutes: 100M x 60 = 6 billion minutes/day. Spread over 24 hours: 6B / 1,440 min = ~4.2M concurrent listeners on average. But music has peak hours (commute time, evenings) โ assume 3x peak: roughly 12-15M concurrent streams. Let's use 15M for safety.
(b) Catalog storage:
80M songs x 3 MB each = 240 TB at one quality level. But streaming services offer multiple quality tiers โ let's say 3 (low 64kbps, medium 128kbps, high 256kbps). That's roughly ~500 TB of music files. With 3x replication across data centers: ~1.5 PB.
(c) Egress bandwidth at peak:
15M concurrent streams x 128 kbps each = 1.92 Tbps. Rounding up for overhead (metadata, API calls, album art): roughly ~2.5-3 Tbps peak egress.
(d) CDN requirements:
At cloud egress prices ($0.02/GB), serving 3 Tbps is catastrophically expensive โ around $640,000 per month. This is exactly WHY companies like Spotify, Netflix, and YouTube build their own CDN infrastructure (or use specialized CDN providers with volume discounts). At scale, you must own your edge network.
This exercise shows how estimation drives business decisions, not just technical ones. The bandwidth cost alone ($7.7M/year from cloud) versus building your own CDN (~$1M/year amortized) is a CEO-level decision that came from 60 seconds of multiplication. In an interview, walking through this reasoning โ especially the cost comparison โ shows you think like a senior engineer, not just a coder.
An e-commerce platform is preparing for a Black Friday flash sale. Here are the baseline and expected numbers:
- Normal traffic: 50,000 QPS
- Expected Black Friday spike: 20x normal = 1,000,000 QPS for roughly 2 hours
- Average response size: 10 KB (product page with metadata)
- Database capacity: single PostgreSQL instance handling 30,000 QPS max (indexed reads)
- Redis cache: each node handles 200,000 operations/sec
Calculate:
- How many Redis cache nodes do you need?
- What cache hit rateThe percentage of requests that find data in the cache (hit) vs having to go to the database (miss). A 97% hit rate means only 3 out of 100 requests touch the database. is required to keep the database under 30K QPS?
- What's the total bandwidth at 1M QPS?
- What infrastructure do you need for the load balancer layer?
(a) Redis cache nodes:
1,000,000 QPS / 200,000 per Redis node = 5 nodes minimum. But you never run at 100% capacity โ add a safety margin of 50-60%: 8 Redis nodes. (If one node dies during Black Friday, you still have capacity.)
(b) Required cache hit rate:
The database maxes out at 30,000 QPS. Total incoming: 1,000,000 QPS. The cache must absorb everything above 30K โ that means 970,000 out of 1,000,000 requests must be cache hits.
Cache hit rate needed: 970,000 / 1,000,000 = 97% minimum.
Is 97% realistic? For a flash sale where everyone is viewing the same few products, absolutely โ the 80/20 ruleIn most systems, 20% of the data handles 80% of the traffic. For a flash sale, it's even more extreme โ maybe 5% of products get 95% of the views. This makes caching extremely effective. is even more extreme during sales (maybe 5% of products get 95% of views). Pre-warm the cache with popular items before the sale starts.
(c) Total bandwidth:
1,000,000 QPS x 10 KB = 10,000,000 KB/sec = 10 GB/sec = 80 Gbps.
(d) Load balancer layer:
A single HAProxyA widely-used open-source load balancer. A single instance can typically handle 1-2 million connections and 40-80 Gbps of throughput depending on hardware. instance handles ~40-80 Gbps. At 80 Gbps, you need 2-3 load balancers in an active-active setup. Cloud load balancers (AWS ALB) auto-scale but need pre-warming โ tell AWS in advance about the Black Friday spike or it takes 10-15 minutes to scale up (by which time your sale page is already down).
Mentioning the pre-warming requirement (both for cache AND load balancers) shows real production experience. Many candidates estimate the numbers correctly but forget that auto-scaling isn't instant. Flash sales require pre-provisioned capacity โ you spin up the extra Redis nodes and LB capacity 30 minutes before the sale starts, not when traffic arrives.
Cheat Sheet โ Estimation at a Glance
Keep these cards bookmarked. They're the numbers you'll reach for in every estimation โ during interviews, capacity planning, or late-night "will this scale?" calculations. Each card is one fact you should know by heart.
Connected Topics โ Where to Go Next
Estimation gives you the numbers. The topics below teach you what to do with those numbers. When your estimate says "500K writes/sec," scalability tells you how to handle it. When it says "97% cache hit rate required," the caching page shows you how to achieve it. Every estimation answer points to an architecture decision โ and these pages cover those decisions in depth.