TL;DR — The One-Minute Version
- The 5-step Baby Steps Framework for ANY system design problem
- Why "just draw boxes" fails — and what to do instead
- Real interview stories from Google, Amazon, Meta, and startups
Every system design question you'll ever face — "design Instagram," "design a rate limiter," "design a chat app," "design Netflix" — follows the same five steps. The system changes every time. The steps never do.
Here's the entire framework you'll learn on this page, condensed into one interactive walkthrough. Click through each step to see what it involves:
Don't touch the whiteboard yet. Spend the first 5 minutes asking questions. What does the system actually need to do? How many users? What's more important — speed or accuracy? Read-heavy or write-heavy?
This is the step most people skip — and it's the step that separates senior engineers from juniors. You can't design a solution if you don't understand the problem.
Example: "Design Instagram" — before drawing anything, ask: Are we designing the feed? The upload pipeline? Stories? DMs? All of them? What scale — 1 million users or 1 billion?
Divide and conquer. Take the scoped problem and split it into 3-5 independent pieces. Each piece handles one job. For Instagram's feed: one piece ingests photos, one stores them, one builds the feed, one delivers it to users.
Think of it like planning a wedding — you don't do everything at once. You split it into: venue, food, music, invitations. Each can be planned separately.
Now zoom in. For each component, define: What data does it store? What API does it expose? How does data flow in and out?
This is where you start drawing actual boxes and arrows — but now each box has a reason to exist. You're not guessing; you're building on a foundation of understanding.
Four decisions drive every system: What database? How will it scale? What should be asynchronous? How do components talk to each other?
Every decision must have a "because." Not "I'll use Redis" — but "I'll use Redis because the feed needs sub-10ms reads, and Redis serves from memory at 0.1ms vs. PostgreSQL's 5ms from disk."
Still vague? Split again. If any component still feels too big or hand-wavy, repeat Steps 2-4 on just that piece. The "Photo Upload Service" might split into: image validation, thumbnail generation, CDN distribution, metadata storage.
This is why it's called "baby steps" — you never try to solve a big problem. You keep splitting until every piece is small enough to explain clearly.
That's the whole framework. Five steps. Works for any system, any scale, any company.
The rest of this page teaches you how to execute each step with real examples, real math, and real interviewer expectations.
The Scenario — "Design Instagram"
Let's put you in the room. Not hypothetically — really in the room.
You're sitting in a Meta interview room. The walls are covered in old whiteboard ink that never fully erased. Your interviewer — a staff engineer who's been building distributed systems for 12 years — opens their laptop, pulls up a blank Excalidraw canvas, and says six words:
Your stomach drops. Your mind immediately starts racing through everything Instagram does: photo uploads, stories that vanish after 24 hours, reels with an algorithmic feed, direct messages, the explore page, search, notifications, ads, the recommendation engine, content moderation, live streaming... and the clock is already ticking.
Here's the math that makes this terrifying:
At 3 minutes per feature, you can't design any of them well. A proper deep dive into just the news feed alone — the data model, fan-out strategy, ranking algorithm, caching layer — takes 15-20 minutes. If you try to touch all 15 features, you'll spend 3 minutes on each and impress nobody. If you don't try, you feel like you're missing something important.
This is the trap. And every single candidate's brain falls into it.
The analysis paralysis hits because your brain thinks it needs to know everything before it can say anything. And since you can't possibly know everything about a system as complex as Instagram, you freeze.
The interviewer doesn't want you to cover all 15 features. They want to see you pick 3-4 core features, explain why those are the right ones to focus on, and then go deep with clear reasoning. The ability to scope — to decide what to leave out — is itself a senior engineering skill.
Most people, when they're honest with themselves, fall into one of four traps. See which one sounds like you:
The First Attempt — "I'll Just Start Drawing Boxes"
Let's watch what most candidates actually do in that Meta interview room. The silence after "Design Instagram" feels unbearable — five seconds feels like five minutes — so you grab the marker (or click on Excalidraw) and start drawing.
A box labeled "Server." An arrow to a box labeled "Database." Another box: "Cache." Another: "Load Balancer." Maybe a "Message Queue" because you read about Kafka last week. Within 3 minutes, you have a diagram with 6 boxes and 8 arrows. It looks... impressive? Technical? Like a real architecture?
Then the interviewer asks a single, devastating question:
And you can't answer. Because you drew it before you understood the problem. You don't know if any operation in this system needs to be asynchronous. You don't know the traffic patterns. You don't know the read/write ratio. You drew a message queue because architecture diagrams on Medium articles have message queues, not because this system needs one.
This is the most common mistake in system design interviews: drawing before thinking. It feels productive — look, I'm making an architecture diagram! — but it's actually the fastest way to fail. Every box you draw is a technology decision. And every technology decision made before you understand the problem is a guess.
Think about it this way: if someone asked you to build a house, would you start pouring concrete before knowing how many rooms they need, what climate they live in, or what their budget is? Of course not. But that's exactly what "start drawing boxes" does — it's pouring concrete without a blueprint.
A Real Contrast: "Design a Rate Limiter" at Stripe
Let's see the same trap from a different angle. A candidate at Stripe gets this question: "Design a rate limiter." They immediately jump to the token bucket algorithm and start writing pseudocode.
The interviewer stops them. They wanted to hear questions first — five questions that would have changed the entire design:
The first candidate jumped to an algorithm. The second candidate spent 90 seconds asking questions — and in those 90 seconds, they demonstrated more engineering judgment than the first candidate showed in their entire pseudocode.
Here are the five questions the Stripe interviewer was hoping to hear:
- What are we rate limiting? API calls? Login attempts? Payment transactions? Each has completely different requirements. Login rate limiting needs strict accuracy (to prevent brute force attacks). API rate limiting can tolerate some fuzziness.
- Per user? Per IP? Per API key? This changes the storage strategy entirely. Rate limiting 1,000 merchants? Everything fits in memory on one machine. Rate limiting 100 million IPs? You need distributed storage.
- What happens when the limit is hit? Return a 429 status code? Queue the request for later? Silently drop it? Show a CAPTCHA? Each response strategy has different complexity.
- Does the limit need to be exact or approximate? Exact counting across distributed servers is hard — you need some form of coordination (consensus, locks, or atomic operations). Approximate counting (allowing 5% overshoot) is 10x simpler to build and 100x easier to scale.
- How many rate-limited entities? Rate limiting 1,000 entities is trivial (everything in memory). Rate limiting 100 million requires distributed counters, and the choice between algorithms like sliding window vs. token bucket suddenly matters.
Now let's see these two approaches side-by-side — "Drawing First" versus "Asking First" — so the contrast is crystal clear:
Where It Breaks — Why "Just Draw Boxes" Fails
Drawing boxes on a whiteboard feels productive. You're moving the marker, creating shapes, writing technology names — it looks like engineering. But here's the uncomfortable truth: most of those boxes are wrong. Not because the technology is bad, but because you chose the technology before you understood the problem.
Let's look at four real failure modes. These aren't hypothetical — they happen in interviews every single day. Each one seems different on the surface, but pay attention: they all share the same root cause.
The Scenario
A candidate is asked: "Design a URL shortener for an internal company tool — about 10 million URLs total, used by 500 employees." They nod, pick up the marker, and immediately start designing for global internet scale: consistent hashing to distribute keys across 20 shards, a multi-region CDN, and a custom ID generation service inspired by Twitter's Snowflake.
What the Interviewer Was Thinking
"They're not listening. I specifically said 10 million URLs and 500 employees. This is a tiny system. They're designing for a problem that's 1,000x bigger than what I asked. Either they don't understand scale, or they didn't bother to listen to the requirements."
The Numbers That Prove It Was Wrong
The Fix
Before drawing anything, do the math. 10 million URLs at 250 bytes each is 2.5 GB — that's less than the RAM on your laptop. A single PostgreSQL database with a B+ tree index can serve lookups in under 1 millisecond. The entire system is: one server, one database, one simple hash function. Done. That's a 5-minute design, and it's the correct one for this problem.
The Scenario
A candidate is designing a social media platform. They say: "I'll use MongoDB because it scales horizontally." The interviewer nods and asks them to sketch the data model. The candidate draws: Users, Posts, Comments, Likes, Followers, and Notifications. Then they need to show the feed — "get all posts from people I follow, sorted by time, with like counts and top 3 comments." Suddenly, they're stuck. That query requires joining 4 collections — which MongoDB doesn't support natively.
What the Interviewer Was Thinking
"They chose the database before they understood the data model. A social network is a graph of relationships — users follow users, users like posts, posts have comments. That's relational data. They picked a document store for relational data because they memorized 'NoSQL scales' without understanding what scaling actually means."
Why This Was Wrong
The candidate confused scaling the database with choosing the right database. These are two completely different decisions. Scaling is about handling more load. The data model is about how your data relates to itself. A social network has deeply interconnected data — users, posts, likes, follows, comments all reference each other. That's exactly what relational databases are built for.
Could you still use MongoDB? Technically yes — but you'd end up denormalizing everything (duplicating data across documents) and writing application-level joins. You'd be fighting the database instead of letting it help you. That's a design smell.
The Fix
Before choosing a database, draw the data model first. List the entities (Users, Posts, Comments) and the relationships between them (User → follows → User, User → likes → Post). If you see lots of many-to-many relationships, you probably want a relational database. If your data is mostly independent documents with no cross-references (like product catalog entries), then a document store makes sense. The data model tells you the database — not the other way around.
The Scenario
A candidate designing a news feed says: "I'll add a Redis cache in front of the database." The interviewer asks: "Why Redis specifically? Why not Memcached? Why not an application-level cache? Actually — why do you need a cache at all? What's the current latency without one?"
The candidate freezes. They can't answer any of these questions. They added Redis because every system design article they read said "add a cache" — it was a cargo-cult decision, not an engineering one.
What the Interviewer Was Thinking
"They're copy-pasting from a template, not thinking. 'Add Redis' isn't a design decision — it's a reflex. I need to see that they understand WHEN caching helps, WHY it helps, and HOW MUCH it helps. Quantify it: what's the DB latency without cache? What's the cache hit ratio? What's the memory cost? Without those numbers, 'add Redis' is meaningless."
The Numbers That Would Have Saved Them
See the trade-off? At 10K concurrent users, PostgreSQL handles the feed in 250ms — totally fine. The cache saves 245ms but costs $200/month and adds operational complexity (cache invalidation, consistency, another system to monitor). At 10 million concurrent users, the math changes completely — now the database can't handle the load and you absolutely need a cache. The right answer depends on the scale, which you can only know if you did Step 1.
The Fix
Every technology choice needs a "because" that includes numbers. Not "I'll use Redis because it's fast" — that's a slogan. Instead: "I'll use Redis because at our expected 10M daily active users, the database would need to serve 50,000 read queries per second for the feed. PostgreSQL maxes out around 10K-20K QPS on a single instance, so we need a caching layer. Redis fits because the feed data is key-value (userId → feed), it supports sorted sets for time-ordered feeds, and the 64 GB memory footprint is affordable."
The Scenario
A candidate is designing a task management app (think: a simple Trello clone for a startup with 5,000 users). Their architecture includes: Kafka for event streaming, Kubernetes for container orchestration, a service mesh for inter-service communication, CQRS to separate reads from writes, and an event sourcing pattern for audit logging.
What the Interviewer Was Thinking
"This is a system that handles ~17 requests per second at peak. A single Node.js server and a PostgreSQL database could run this on a $20/month VPS. They just designed an architecture that costs $50,000/month, requires a team of 6 engineers to maintain, and takes 3 months to build. The product needs to ship next week. They have no sense of proportionality."
The Numbers That Prove It Was Wrong
The Fix
Start with the simplest architecture that works: one monolith, one database, one server. Then ask: "At what scale does this break?" If the answer is "at 10 million users" and you have 5,000 — your monolith is fine for the next 3 years. Design for today's problem with a clear migration path for tomorrow's. Don't build tomorrow's infrastructure today.
The Breakthrough — The Baby Steps Framework
Now you've seen the problem: jumping straight to architecture produces wrong scope, wrong tech, no justification, and over-engineering. But what's the alternative? If "just draw boxes" doesn't work, what does?
The answer is surprisingly simple. Instead of trying to solve the entire problem at once (which is impossible — your brain literally can't hold a billion-user distributed system in working memory), you break it into baby steps. Each step is small enough to think about clearly. Each step builds on the previous one. And if any step still feels too big? You break that step into smaller steps. Recursively. Until everything is small enough to reason about.
This isn't just a study technique — it's how real engineers at Google, Amazon, and Netflix actually design systems at work. The only difference between a junior engineer and a principal engineer isn't that the principal knows more technologies. It's that the principal is better at breaking big problems into small pieces.
Here's the framework. Five steps. Works for any system, any scale, any interview.
What you do: Ask questions. Define scope. Estimate scale. This is the step that prevents every failure from Section 4.
Why this step exists: A system that serves 1,000 users is fundamentally different from one that serves 1 billion users. A system that needs sub-50ms responses is different from one where 5-second responses are fine. You can't make any technology decision intelligently until you know the constraints. Questions you should always ask:
- Who are the users? — Internal employees? Global consumers? Developers via an API?
- How many users? — This determines whether you need 1 server or 1,000.
- What are the key features? — "Design Instagram" doesn't mean build ALL of Instagram. Pick 2-3 core features.
- What are the non-functional requirements? — Non-functional requirements include latency (how fast?), availability (how reliable?), and consistency (can data be slightly stale?).
- Read-heavy or write-heavy? — This single question changes your entire database and caching strategy.
What you do: Take the scoped problem from Step 1 and split it into 3-5 independent pieces. Each piece has one clear job.
Why this step exists: Your brain can hold about 4 things in working memory at once. A system with 15 moving parts overflows your brain. But if you split it into 4 components, you can think about each one clearly.
How to identify components: Look for natural boundaries. Different data? Separate component. Different access patterns? Separate component. Different scaling needs? Separate component. For a chat app, the messaging service (real-time, write-heavy) is clearly separate from the user profile service (read-heavy, rarely changes).
What you do: For each component, define three things: its API (what can you ask it to do?), its data model (what does it store?), and its data flow (how does data move in and out?).
Why this step exists: A box labeled "Feed Service" on a whiteboard is meaningless until you specify what it actually does. What endpoints does it expose? What data does it read/write? How does it communicate with other components? This is where hand-waving turns into real engineering.
For each component, answer:
- API: What endpoints does this expose? What parameters? What does it return?
- Data model: What tables or documents? What fields? What indexes?
- Data flow: Where does data come from? Where does it go? Is it synchronous (wait for response) or asynchronous (fire and forget)?
What you do: For each component (or the system as a whole), make four critical decisions. These four choices determine 80% of your architecture:
- Database choice: SQL or NoSQL? Which specific engine? Why?
- Scaling strategy: Vertical (bigger machine) or horizontal (more machines)? At what point?
- Sync vs. Async: Which operations need immediate responses? Which can be processed in the background?
- Communication protocol: REST, gRPC, WebSockets, or message queues?
Why these 4 specifically? Because they're the decisions that most affect cost, performance, reliability, and complexity. Get these right, and the rest of the design flows naturally. Get them wrong, and you'll be fighting your own architecture the entire time.
What you do: Look at each component from Step 3. If any of them still feels vague or too big, treat that component as its own mini-system and apply Steps 1-4 again. Recursively.
Why this step exists: Some components are simple enough after one pass — a URL shortener's redirect service is just a key-value lookup. But some components are mini-systems themselves — a "Notification Delivery Service" might need its own API, its own queue, its own rate limiter, and its own retry logic. When that happens, you zoom into that component and repeat the framework.
When to stop: Stop decomposing when every component can be explained in 2-3 sentences with a clear API, a clear data model, and clear technology choices. If you can't explain it simply, split it further. If you can, you're done.
The framework isn't a one-pass process — it's a recursive loop. You start with the big problem, break it down, and for any piece that's still too complex, you apply the same framework again at a smaller scale. It's like zooming into a map: first you see the continent, then the country, then the city, then the street. Each zoom level uses the same process — just at a different scale.
The 5-Step Framework in Detail — Two Full Walkthroughs
Reading about a framework is one thing. Seeing it work is another. In this section, we'll apply the Baby Steps Framework to two real systems — one simple, one complex. Same five steps, different problems, same structured approach.
Pay attention to how the process stays identical even though the problems are very different. That's the whole point of a framework: it works regardless of the specific system you're designing.
Walkthrough 1: Design a URL Shortener
Before drawing a single box, we ask questions. Here's what a great Step 1 looks like:
| Question | Interviewer's Answer | Why It Matters |
|---|---|---|
| How many URLs per day? | 100 million new short URLs/day | Determines write throughput and storage growth |
| Read-to-write ratio? | 100:1 (mostly redirects) | Massively read-heavy → caching is critical |
| How long do URLs live? | 5 years default | Total storage = 100M/day × 365 × 5 = 182.5 billion URLs |
| Custom short URLs? | Nice to have, not core | Deprioritize — focus on auto-generated keys |
| Analytics needed? | Basic click counts | Need a counter per URL, not a full analytics pipeline |
| Latency requirement? | Redirect under 50ms | Sub-50ms means the hot data MUST be in memory (cache) |
Now we have enough information to do back-of-envelope estimation:
A URL shortener has a beautifully simple component breakdown. Even at massive scale, there are really only three pieces:
- URL Creation Service — Takes a long URL, generates a unique short key, stores the mapping. Handles the 1,160 writes/sec.
- URL Redirect Service — Takes a short key, looks up the long URL, returns an HTTP 301/302 redirect. Handles the 116,000 reads/sec.
- Analytics Service — Counts clicks per short URL. Read access patterns are different (batch queries, not real-time).
Notice how each component has a single job and a clear reason to exist. The creation service is write-heavy. The redirect service is read-heavy. They have completely different performance profiles, which is why they're separate — you might scale them independently.
Now we zoom into each component and define exactly what it does. Let's start with the two core APIs:
API Design
Data Model
Why CHAR(7) for the short key? Let's do the math. We use Base62 encoding (a-z, A-Z, 0-9 = 62 characters). A 7-character key gives us 627 = 3.5 trillion possible combinations. We need 182 billion. So 7 characters gives us roughly a 19x safety margin — plenty of room.
Request Flow: Creating a Short URL
Request Flow: Redirecting
On a cache miss, the server reads from the database (~5ms) and populates the cache for next time. With a 100:1 read/write ratio, most popular URLs stay hot in the cache, giving us a cache hit rate above 90%.
Decision 1: Database — SQL vs NoSQL?
Verdict: Both work. A URL shortener is essentially a key-value store — simple enough that the database choice barely matters. If you're comfortable with PostgreSQL, use it with range-based sharding on the short key. If you want zero-ops horizontal scaling, DynamoDB is excellent for this use case. The important thing is the "because": "I chose DynamoDB because the data model is pure key-value, we need automatic horizontal scaling to 18 TB, and we don't need transactions or joins."
Decision 2: Key Generation — Base62 vs MD5 Hash?
Verdict: Use a distributed counter approach. Pre-generate ranges (Server 1 gets IDs 1-1M, Server 2 gets 1M-2M, etc.) to avoid the centralized bottleneck. Each server encodes its IDs as Base62 independently. No collisions, no coordination after initial range assignment. This is simpler and more reliable than hash-based approaches.
Decision 3: Caching Strategy
At 116K reads/sec, we absolutely need a cache. Redis is the right choice here because: (1) the data is key-value (short_key → long_url), which is Redis's sweet spot, (2) we need sub-2ms reads for the 50ms latency target, and (3) Redis's LRU eviction naturally keeps popular URLs hot. With a 20 GB Redis instance, we can cache the top ~200 million most popular URLs in memory.
Decision 4: Sync vs Async
URL creation and redirect are both synchronous — the user needs an immediate response. Analytics counting, however, can be asynchronous: on every redirect, we fire a message to a simple queue, and a background worker increments the click counter. This keeps the redirect path fast (no extra database write in the critical path).
For a URL shortener, most components are already simple enough. But there's one piece worth decomposing further: the ID Generation Service.
If we use a single auto-incrementing counter, it becomes a single point of failure. So let's apply the framework recursively:
- Step 1 (for ID Gen): We need 1,160 unique IDs per second, globally unique, no collisions.
- Step 2 (for ID Gen): Two approaches: (a) a Zookeeper-based range allocator, or (b) multiple independent counters with different offsets.
- Step 3 (for ID Gen): Range allocator: each server requests a block of 1 million IDs. It uses those locally until exhausted, then requests another block.
- Step 4 (for ID Gen): ZooKeeper for range allocation because we need strong consistency (two servers must never get the same range), and ZooKeeper is built for exactly this kind of coordination.
See? Same 5 steps, applied to a sub-problem. The ID Generation Service went from a vague "generate unique IDs" to a concrete, justified design with specific technology choices.
...
Walkthrough 2: Design a Notification System
Now let's see the same framework handle a much more complex system. A notification system has multiple delivery channels (push, email, SMS), rate limiting, user preferences, templates, and retry logic. It's a different beast from a URL shortener — but the steps are identical.
Here are the questions that define this system:
| Question | Interviewer's Answer | Why It Matters |
|---|---|---|
| What types of notifications? | Push (iOS/Android), Email, SMS | Three completely different delivery pipelines |
| How many notifications/day? | 10 million push, 5 million email, 1 million SMS | Total: 16M/day = ~185 per second average |
| Latency requirement? | Push: under 1 second. Email: under 30 seconds. SMS: under 5 seconds | Different channels have different SLAs — separate queues needed |
| Can users opt out? | Yes, per channel and per notification type | Need a preferences service that's checked on every send |
| What about rate limiting? | Max 5 push per hour, 3 email per day, 1 SMS per day per user | Need a rate limiter per user per channel |
| Retry on failure? | Yes, up to 3 retries with exponential backoff | Need a retry queue with delay capabilities |
| Templates? | Yes, notifications use templates with variable substitution | Need a template storage and rendering service |
This system has more natural boundaries than the URL shortener. Each component handles a distinct concern:
- API Gateway — Receives notification requests from other services (e.g., "Send a password reset email to user 12345")
- Notification Orchestrator — The brain. Receives requests, checks preferences, applies rate limits, renders templates, and routes to the right channel
- Delivery Services — Three separate services: Push (talks to APNs/FCM), Email (talks to SMTP/SendGrid), SMS (talks to Twilio)
- User Preferences Service — Stores and serves per-user, per-channel opt-in/opt-out settings
- Rate Limiter — Enforces per-user, per-channel sending limits
Notice how the message queue (Kafka) separates the orchestrator from the delivery services. This is critical: if the SMS provider is slow, it doesn't slow down push notifications. Each channel consumes from its own queue independently.
API Design
Notice the response is 202 Accepted, not 200 OK. This tells the caller: "I've received your request and queued it for processing, but it hasn't been delivered yet." This is the asynchronous pattern in action — the API responds instantly, and the actual delivery happens in the background.
Core Notification Flow
The caller gets a response in 12ms. The actual push notification arrives at the user's phone ~250ms later. The caller doesn't wait for delivery — that's the beauty of async processing with a message queue.
Data Models
Decision 1: Why Kafka for the Message Queue?
We have three delivery channels with different speeds and different failure rates. Kafka gives us:
- Separate topics per channel — Push, email, and SMS each have their own topic. If the SMS provider goes down, push notifications keep flowing.
- Persistence — Messages are stored on disk. If a delivery service crashes, it picks up where it left off when it restarts. No messages lost.
- Replay capability — If we deploy a buggy push service that drops messages, we can replay the Kafka topic to re-send them.
Why not RabbitMQ? At 1,850 messages/sec, either would work. But Kafka's replay capability is valuable for a notification system — if a delivery fails, we can retry from the same message without re-generating it. RabbitMQ deletes messages after consumption, so you'd need a separate retry mechanism.
Decision 2: Why PostgreSQL for Preferences and Templates?
User preferences are relational: a user has preferences across multiple channels and notification types. Templates are structured documents with version history. Both benefit from:
- ACID transactions — When a user toggles "email OFF," that change must be immediately consistent. We can't risk sending an email 1 second after the user opted out.
- Rich queries — "Show me all users who opted out of marketing emails" is a single SQL query. Useful for compliance reporting.
The read volume is low (~185 preference lookups/sec at average load). A single PostgreSQL instance handles this trivially.
Decision 3: Why Redis for Rate Limiting?
Rate limiting needs two properties: it must be fast (in the critical path of every notification) and it must be atomic (two concurrent requests shouldn't both pass when only one should). Redis gives us:
- Sub-millisecond reads/writes — In the critical path, we can't afford 5ms database calls.
- Atomic INCR + EXPIRE — Redis's
INCRcommand is atomic, so two concurrent notifications won't both read "4 out of 5" and both pass. - Built-in TTL — Counters auto-reset when the window expires. No cleanup job needed.
Decision 4: Async Everywhere Except Preferences
The orchestrator checks preferences synchronously (we must respect opt-outs before queuing). Everything after that is asynchronous: the actual delivery happens via Kafka consumers. This keeps the API response fast (~12ms) while delivery happens in the background (100ms-30s depending on channel).
The Push Delivery Service still feels complex. It needs to: consume from Kafka, call APNs/FCM, handle failures, retry with backoff, and track delivery status. Let's apply the framework recursively:
Step 1 (for Push Service): Requirements
- Consume 10M messages/day from Kafka push topic (~116/sec average, ~1,160/sec peak)
- Call APNs for iOS and FCM for Android (two different external APIs)
- Retry up to 3 times with exponential backoff (1s, 4s, 16s)
- Track delivery status (queued, sent, delivered, failed)
Step 2 (for Push Service): Sub-components
- Kafka Consumer — Reads messages from the push topic
- Platform Router — Routes to APNs (iOS) or FCM (Android) based on device token
- Retry Queue — Failed messages go here with a scheduled retry time
- Status Tracker — Updates notification status in the database
Step 3 (for Push Service): Data Flow
On failure, the message goes to the retry queue instead:
This is the recursive power of baby steps. The "Push Service" went from a vague box on a whiteboard to a fully specified sub-system with its own consumer, router, retry logic, and status tracking. And we got there using the exact same 5 steps.
The Evaluation Checklist — Is Your Design Good?
You've scoped the problem, broken it into components, defined APIs, picked technologies, and drawn the final architecture. You're feeling good. But here's the question nobody asks themselves: how do you know this design is actually good?
A surprising number of candidates finish their design, lean back, and say "that's it." The interviewer nods slowly, then starts poking holes. "What happens when your database goes down?" "Can this handle 10x the traffic next year?" "Why do you need Kafka here?" Suddenly, the design unravels — not because it was wrong, but because the candidate never stress-tested their own work.
Great engineers don't wait for someone else to find problems. They evaluate their own designs before anyone else does. Here are the four criteria every system design must pass.
This is pure math. Go back to the numbers you calculated in Step 1 — QPS, storage, bandwidth — and trace them through your architecture. Does every component handle the load you promised?
Example: You said the system handles 10,000 reads/second. Your design has a single PostgreSQL instance. Can PostgreSQL handle 10K reads/sec? With the right indexes and connection pooling, yes — a well-tuned Postgres handles 20K+ simple reads/sec. But if you said 500,000 reads/sec and still have one database, that's a gap. The math doesn't lie.
Check: For every number in your requirements, point to the component that handles it. If you can't point, you have a gap.
Everything fails. Servers crash, networks split, disks fill up. The question isn't "will it fail?" but "what happens WHEN it fails?" Walk through each critical component and imagine it disappearing.
Example: Your cache (Redis) goes down. Does the whole system crash? Or does it gracefully fall back to the database — slower, but still working? If losing one component kills the entire system, that component is a single point of failure, and your design has a serious gap.
Check: Pick your 3 most important components. For each one, answer: "If this dies, what happens to the user?" If the answer is "everything breaks," you need a fallback plan.
Systems are never "done." Today it's a URL shortener. Next quarter, product wants analytics — how many clicks per link, from which countries, at what times. The quarter after that, they want custom domains. Can your architecture absorb these features without a rewrite?
Example: You stored short URLs in a simple key-value table (short_code → long_url). Adding click analytics means you need a separate clicks table. If your services are cleanly separated (Redirect Service vs. Analytics Service), adding this is easy — just a new service that reads from a click event stream. If everything is jammed into one monolith, adding analytics means rewriting the whole thing.
Check: Name one realistic feature that might come next. Can you add it by adding a new component, or does it require changing existing ones? Adding is easy. Changing is expensive.
Every component must earn its place. This is the opposite of the first three checks — instead of asking "is it enough?" you're asking "is it too much?" For every box in your diagram, you should be able to say: "This exists because [specific math or requirement]. Without it, [specific thing breaks or becomes unacceptable]."
Example: You added a message queue (Kafka) between your API and your database. Why? "Because the write volume is 50,000/second and the database can only sustain 10,000 writes/second — the queue absorbs the burst and lets the database consume at its own pace." That's justified. But if your write volume is 100/second and the database handles 10,000 — Kafka is overhead with no benefit.
Check: For every component, complete this sentence: "Without this, _____ would break." If you can't fill in the blank, remove the component.
Different Interview Formats — Same Framework, Different Pacing
The five-step framework works everywhere. But "everywhere" doesn't look the same. A 45-minute Google round feels completely different from a 30-minute startup screen, even though you're using the same thinking process. The difference isn't what you do — it's how fast you do it and where you spend your depth.
Think of it like cooking the same recipe with different time limits. You can make a great pasta in 15 minutes or 60 minutes — the ingredients are the same, but what you can do with them changes dramatically. A 60-minute version might have a slow-simmered sauce with layers of flavor. A 15-minute version needs to be fast, focused, and impressive with simplicity. Neither is "better" — they're just different calibrations of the same skill.
Let's break down the three most common interview formats and how to allocate your time in each one:
Now let's look at what interviewers expect from different experience levels within each format. A junior engineer isn't expected to nail the same things as a senior — but juniors often don't know that, which leads to unnecessary panic.
What Juniors Are Expected to Show
If you're interviewing for a junior or mid-junior role, breathe. Nobody expects you to design Google-scale systems from scratch. Here's what interviewers actually look for at your level:
- Clear thinking process — You ask clarifying questions before jumping to solutions. You don't try to boil the ocean.
- Reasonable component breakdown — You can split a system into 3-5 sensible pieces (API, database, cache) even if you can't explain the internals of each one.
- Basic trade-offs — You know that SQL is good for relational data and NoSQL is good for flexible schemas. You don't need to know the internal page structure of B+ trees.
- Honesty about gaps — "I'm not sure how Kafka works internally, but I know it's a message queue that decouples producers from consumers" is a great answer at this level.
What Mid-Level Engineers Are Expected to Show
At this level, the bar goes up. You're expected to not just break the problem down, but to make and defend real technical decisions:
- Justified technology choices — "I'm using PostgreSQL because the data is relational and the scale (2M rows) doesn't need sharding." Not just "I'll use Postgres."
- Back-of-envelope math — You can estimate QPS, storage, and bandwidth. You don't need to be exact, but you should be in the right ballpark and use the numbers to drive decisions.
- Failure awareness — You proactively mention what happens if a component goes down. "If Redis crashes, we fall back to the database — latency goes from 5ms to 50ms, which is acceptable."
- API and data model design — You can sketch a clean REST API with proper endpoints and define a reasonable database schema.
What Senior+ Engineers Are Expected to Show
At the senior level, you're not just solving the problem — you're owning it. The interviewer expects you to drive the entire conversation with minimal prompting:
- You drive the conversation — The interviewer barely needs to speak. You scope, prioritize, explain trade-offs, and proactively address concerns before they're raised.
- Deep trade-off analysis — Not just "SQL vs NoSQL" but "PostgreSQL with Citus sharding vs. DynamoDB: Citus gives us SQL JOIN capability but adds operational overhead; DynamoDB is managed but forces us to denormalize. Given our read pattern (mostly point lookups, rare joins), DynamoDB wins here."
- Production-grade thinking — You talk about observability, deployment strategies, rollback plans, and SLAs.
- Elegant simplicity — Paradoxically, the most senior engineers often produce the simplest designs. They've seen over-engineered systems fail and know that simplicity is a feature. They add complexity only when the math demands it.
Real Interview Experiences — What Actually Happens in the Room
Theory is great. The framework is solid. But what does a system design interview actually feel like when you're sitting in the chair, the timer is running, and the interviewer is watching you think?
Here are four real stories from engineers who interviewed at top companies. Names and some details are changed, but the dynamics — the questions, the mistakes, the turning points — are real. Pay attention to the lessons at the end of each story. They're more valuable than any textbook.
The Question
"Design a URL shortener like bit.ly."
What the Candidate Did
Instead of jumping to the architecture, she spent the first 7 minutes asking clarifying questions: "How many URLs per day? Do short links expire? Do we need analytics (click counts)? Is this public-facing or internal? What's the expected read-to-write ratio?" The interviewer answered each one, and by the end, she had a crisp scope: 100M new URLs/day, links never expire, basic analytics (click count only), 100:1 read-to-write ratio.
Then she did the math on the whiteboard: 100M writes/day = ~1,150 writes/sec. At a 100:1 ratio, that means ~115,000 reads/sec. Each record is ~500 bytes, so 100M/day × 500B × 365 days = ~18 TB/year. She mapped every number to a technology choice: "115K reads/sec needs a cache — Redis can handle 500K+ reads/sec from memory, so one Redis instance is enough. 18 TB/year means we need to partition the database, but the lookups are simple key-value, so DynamoDB handles this natively."
When the interviewer pushed back — "What if the ID generation becomes a bottleneck?" — she had an answer ready because she'd thought about it during scoping: "We can use a distributed ID approach with range-based allocation — each server pre-fetches a block of 10,000 IDs, so there's no central bottleneck."
Interviewer Feedback
"She drove the entire conversation. I barely had to prompt her. Every technology choice had a number behind it, and when I pushed back, she had already considered the alternative. This is what L5 looks like."
The Lesson
The Question
"Design a ride-sharing system like Uber."
What the Candidate Did
He had clearly studied this problem before. His architecture was solid: a matching service that pairs riders with nearby drivers, a location service that tracks driver positions in real-time using geospatial indexes, a pricing service that calculates surge pricing, and a notification service for ride updates.
The design was technically strong. But at Amazon, technical competence is the floor, not the ceiling. The interviewer kept steering toward leadership principles — "How would you decide between building the matching algorithm in-house versus buying a third-party solution?" "How would you handle a situation where the pricing team wants to change the surge algorithm but the matching team says it'll break their system?" "What trade-offs would you make if you had to ship the MVP in 3 months?"
Each time, the candidate gave a purely technical answer. "I'd benchmark both solutions and pick the faster one." "I'd add a versioned API between the services." "I'd cut the analytics dashboard." He never talked about customer impact, team dynamics, or business trade-offs.
Interviewer Feedback
"Strong technical design, genuinely. But at the SDE3 level, we need someone who thinks beyond the code. When I asked about build vs. buy, I wanted to hear about customer obsession — which option gets value to riders faster? When I asked about the team conflict, I wanted to hear about earning trust and disagree-and-commit. He answered like an engineer. We needed an engineering leader."
The Lesson
The Question
"Design the Facebook News Feed."
What the Candidate Did
This is one of the most intimidating questions because the real News Feed is incredibly complex — ML ranking models, social graph traversal, real-time updates, and thousands of engineers working on it. The candidate's genius move was starting simple and evolving.
He began with the simplest possible feed: "Let's start with a chronological feed — when you open the app, you see the 50 most recent posts from your friends, newest first." He drew three components: a Post Storage service, a Friend Graph service, and a Feed Builder that queries both. He estimated: "If a user has 500 friends and each friend posts once per day, that's 500 posts to sort — trivial for a single query."
Then the interviewer said: "OK, now you have 2 billion users." The candidate didn't panic. He calmly said: "That changes the math significantly. Let me re-estimate." He calculated: 2B users, maybe 500M daily active, each loading the feed ~10 times/day = 5B feed loads/day = ~58K feeds/second. "We can't compute each feed in real-time at this scale. We need to pre-compute." He introduced a fan-out-on-write approach for regular users and a fan-out-on-read hybrid for celebrities.
Interviewer Feedback
"What impressed me was the evolution. He didn't try to design for 2B users from the start. He built something simple, proved it worked at small scale, then scaled it up with specific reasoning. That's how we actually build things here — start simple, measure, then optimize."
The Lesson
The Question
"Design a real-time chat system for our B2B product. We have about 10,000 companies using us, with maybe 50 people per company."
What the Candidate Did
Here's what made this interview special: the candidate matched the complexity of the design to the complexity of the company. She didn't design for WhatsApp scale. She designed for this startup's scale.
She did quick math: 10,000 companies × 50 users = 500,000 total users. Even if all are online at once (they won't be), that's 500K WebSocket connections. A single server can handle ~65K connections, so she'd need about 8 servers. But realistically, maybe 10% are online at peak — so 50K connections, which is one well-configured server.
"For a startup at this scale, I'd keep it simple: one WebSocket server behind a load balancer, PostgreSQL for message storage, Redis for presence tracking (who's online). No Kafka, no sharding, no microservices. When you hit 100K concurrent users, we revisit."
The CTO pushed: "What about when we grow 10x?" She responded: "At 5 million users, I'd introduce message queues for async delivery, shard the database by company_id (natural partition), and add a second WebSocket server pool. But I wouldn't build that today because YAGNI — the engineering cost of building it now is real, and the scale that justifies it is hypothetical."
Interviewer Feedback
"She understood our stage. At a startup, the right architecture is the simplest one that works today and can evolve tomorrow. She didn't try to impress me with Kafka and Kubernetes — she impressed me by knowing when NOT to use them. That's maturity."
The Lesson
Anti-Lessons — Three Ways to Guarantee Failure
We've talked about what to do. Now let's talk about what NOT to do. These three anti-patterns come up in interviews over and over again. They're so common that interviewers have names for them. And here's the frustrating part: the engineers who fall into these traps often know the material well. They fail not because they lack knowledge, but because they present it wrong.
Understanding these failure modes is just as important as understanding the framework itself — because you can't fix a mistake you don't know you're making.
What Happened
A candidate was asked to design a URL shortener. Within 30 seconds, they started drawing a very specific architecture: a hash function that generates Base62 IDs, a ZooKeeper cluster for distributed ID generation, DynamoDB for storage, CloudFront CDN for caching, and a separate analytics pipeline with Kafka and Spark. It looked impressive — detailed, confident, fast.
Then the interviewer asked: "Why ZooKeeper instead of a simpler approach like a database sequence?" The candidate paused. "Because... ZooKeeper is what you use for distributed coordination." The interviewer pushed further: "But do you need distributed coordination? How many URL-generation servers do you have?" Long pause. "I... hadn't really thought about that specific point."
The interview went downhill from there. Every follow-up question revealed the same pattern: the candidate had memorized an architecture but didn't understand why each piece existed. They couldn't adapt when the interviewer changed the constraints.
WHY This Is Wrong
Memorized answers are fragile. They work perfectly for exactly one version of the question and break the moment anything changes. Real interviews are conversations, not recitations. The interviewer will always push you off the "standard" path — that's the whole point. They want to see how you think, not how well you memorize.
The deeper problem is that memorizing blocks understanding. When you memorize "use ZooKeeper for ID generation," you skip the reasoning chain: "I need unique IDs → how many servers generate IDs? → if just one, a database sequence works → if multiple, I need coordination → ZooKeeper is one option for coordination, but so is range-based allocation." Without that chain, you can't adapt.
What to Do Instead
Instead of memorizing "URL shortener uses Base62 + DynamoDB + ZooKeeper," learn the forces: unique ID generation (several approaches, each with trade-offs), mapping storage (depends on scale and query patterns), and efficient redirects (depends on read volume). Understand the forces, and you can handle any variant.
What Happened
A candidate was asked to design a task management app (like a simple TODO list) for a startup with 5,000 users. Their design included: a Kubernetes cluster with 12 microservices, Kafka for event streaming, Elasticsearch for search, Redis for caching, a service mesh (Istio) for inter-service communication, and a separate ML pipeline for "smart task prioritization."
The interviewer asked a simple question: "How many writes per second does this system handle?" The candidate paused, did the math — 5,000 users, maybe 10 tasks per user per day = 50,000 tasks/day = 0.6 writes/second. Less than one write per second. For a system with 12 microservices, Kafka, and Kubernetes.
WHY This Is Wrong
This is over-engineering, and it's driven by insecurity. The candidate thought complexity = impressive. In reality, complexity without justification = poor judgment. Every component you add is a component that can break, needs monitoring, needs maintenance, and costs money. Adding Kafka for 0.6 writes/second is like hiring a full-time chauffeur because you drive to the grocery store once a week.
The math tells you what you need. 0.6 writes/second? A single PostgreSQL instance handles 10,000+ writes/second. You don't need caching (the database is already 10,000x faster than your load requires). You don't need a message queue (there's no write burst to absorb). You don't need Kubernetes (a single server running a monolith handles this with 99.99% of its capacity idle).
What to Do Instead
Always do the math first, then choose technology. The architecture should be the minimum viable system that meets the actual requirements. For a 5,000-user TODO app: one server, one PostgreSQL database, one simple web framework. Done. If the interviewer asks "what about 10x growth?" — great, 50K users is still only 6 writes/second. Still one server. Maybe add Redis for session caching at that point, but probably not even that.
The simplest architecture that meets the requirements isn't a "beginner" answer — it's the correct answer. Companies waste millions of dollars every year on over-engineered systems. Showing that you can resist that temptation is a sign of real engineering maturity.
What Happened
A candidate was asked to design a notification system. They nodded, turned to the whiteboard, and started drawing. For 25 minutes straight, they drew boxes, arrows, databases, queues, and services. The architecture was actually quite good. The problem? They didn't say a single word while drawing it.
The interviewer couldn't follow the reasoning. Why is this arrow going from Service A to Service B? Why Redis and not Memcached? Why three database shards instead of two? The candidate had reasons for all of these choices — good reasons — but they were all locked inside their head. The interviewer saw boxes and arrows. They didn't see the thinking.
At the 25-minute mark, the interviewer interrupted: "Can you walk me through your reasoning?" The candidate looked at their diagram and said: "So... this is the notification service... and it talks to this queue... and then..." They were reverse-engineering their own design, trying to explain decisions they'd already made without remembering the reasoning chain.
WHY This Is Wrong
System design interviews aren't about the final diagram. They're about watching you think. The interviewer needs to see your reasoning process in real time: "I'm choosing Redis here because we need sub-millisecond reads and the working set fits in 64 GB of RAM." If you draw in silence, the interviewer has to guess why you made each choice — and they won't guess charitably.
There's a second problem: when you design silently, you don't get real-time feedback. The interviewer might have nudged you toward a better approach at minute 5 — but they couldn't, because they didn't know what you were thinking. By minute 25, you've gone too far down a path to pivot. The collaborative part of the interview never happened.
What to Do Instead
Think out loud. Constantly. Narrate every decision as you make it: "I'm going to start with the write path because it's simpler and will help me establish the data model. The user sends a notification request to our API gateway. I'm putting an API gateway here because we'll need rate limiting — one bad client shouldn't be able to flood our system with notifications."
This feels awkward at first, especially if you're naturally quiet. Practice by designing systems in front of a mirror or recording yourself on video. The goal is to make thinking out loud feel as natural as breathing. In an interview, your spoken reasoning is more important than the diagram itself.
Common Mistakes — Six Traps That Kill Good Candidates
The anti-patterns in Section 10 are big, obvious failures. The mistakes in this section are more subtle — and more dangerous because of it. These are the traps that catch good candidates. People who know the material, who've practiced, who can draw reasonable architectures — but who lose marks on things they didn't realize mattered.
We surveyed dozens of interview debriefs from engineers at Google, Amazon, Meta, and startups. These six mistakes appeared again and again. The chart below shows how often each one came up — notice that the most common mistake is also the simplest to fix.
The Mistake
The interviewer says: "Design a URL shortener." The candidate immediately starts drawing: hash function, database, cache, load balancer. Within 2 minutes, they have boxes on the whiteboard. But they never asked: How many URLs? Do they expire? Who uses this — internal or public? Do we need analytics? What's the read-to-write ratio?
Real Example
A candidate designed an incredibly sophisticated URL shortener with consistent hashing, multi-region replication, and a bloom filter for collision detection. Impressive engineering. Then the interviewer revealed: "This is for an internal team of 200 people generating maybe 100 links per day." The entire architecture was 100x too complex. The candidate wasted 35 minutes building the wrong thing.
How to Fix It
Force yourself to ask at least 5 questions before touching the whiteboard. Make it a rule. Write the questions and answers at the top of the whiteboard so they stay visible throughout the interview. The answers will constrain your design and prevent you from building something the problem doesn't need.
If you can't think of questions, use this starter list: (1) What's the expected scale — users, requests/sec? (2) What are the core features vs. nice-to-haves? (3) Is it read-heavy or write-heavy? (4) What are the latency requirements? (5) Is consistency or availability more important?
The Mistake
The candidate says things like "this should handle the load" or "we'll need to scale this" without ever calculating what "the load" actually is. They make technology choices based on vibes instead of numbers. "I'll use Redis for caching" — but they never calculated whether the database alone could handle the read volume.
Real Example
A candidate designing a photo-sharing app said: "We'll shard the database across 10 nodes." The interviewer asked: "Why 10?" Long silence. "It's... a good number?" The actual math: 50M photos/year × 2MB average = 100 TB/year. Each database node can handle ~10 TB comfortably. So you need 10 nodes in year one, growing to 30+ by year three. The candidate got the right number by accident, but couldn't explain it — which means they couldn't adjust it when the interviewer changed the assumptions.
How to Fix It
Calculate three numbers for every system: QPS (queries per second), storage (GB or TB), and bandwidth (MB/s). These three numbers drive 90% of architecture decisions. Write them on the whiteboard and reference them constantly. "I'm adding a cache because the database handles 5,000 reads/sec but we need 50,000 — so the cache absorbs the extra 45,000." Now every decision has a number behind it.
Don't worry about precision. Back-of-envelope math is about order-of-magnitude accuracy. Saying "about 50,000 reads/sec" is just as useful as saying "52,417 reads/sec." The goal is to land in the right ballpark so your architecture matches the actual scale.
The Mistake
The candidate designs for the happy path only. Everything works perfectly in their design: servers never crash, networks never partition, disks never fill up. They treat their architecture like a textbook diagram instead of a real system running on real hardware that breaks in real ways.
Real Example
A candidate designed an e-commerce checkout system with a single payment gateway integration. The interviewer asked: "What happens if Stripe goes down for 30 minutes during Black Friday?" The candidate froze. They hadn't considered this. Their system had no fallback payment processor, no queuing mechanism for retrying payments, and no way to hold orders while payments were unavailable. During the busiest shopping day of the year, their system would lose every sale.
How to Fix It
After drawing your architecture, do a "failure walk." Point to each critical component and say: "If this dies, here's what happens." For each one, you need either redundancy (a replica takes over), graceful degradation (the system works but with reduced capability), or explicit acknowledgment ("this is a calculated risk because adding redundancy here costs X and the probability of failure is Y").
You don't need redundancy for everything — that's over-engineering (Mistake #4). But you do need to know what happens when each component fails and have a deliberate plan for the critical ones.
The Mistake
The candidate hears "design X" and immediately designs for Google-scale traffic, regardless of the actual requirements. They add sharding, distributed caching, multi-region replication, and auto-scaling groups for a system that serves 1,000 users.
Real Example
A startup interview asked: "Design the backend for our internal employee dashboard — 300 employees, maybe 50 concurrent users at peak." The candidate designed a multi-AZ deployment with read replicas, ElastiCache, and an API gateway with rate limiting. The CTO's response: "We just need a Django app on one EC2 instance. This design would cost us $3,000/month for something that could run on a $20/month server."
How to Fix It
Let the numbers tell you what you need. 50 concurrent users at peak? That's maybe 10 requests/second. A single server running any modern web framework handles 1,000+ requests/second. You don't need caching (the database is 100x faster than your load). You don't need sharding (the data fits in RAM). You don't need multi-region (your users are all in one office).
Start with the simplest possible architecture that meets the requirements. Then, only add complexity when the math tells you the simple approach won't work. "I'm adding a cache because the database can handle 5K reads/sec but we need 50K" — that's justified complexity. "I'm adding a cache because best practices say so" — that's over-engineering.
The Mistake
The candidate rattles off technology names like a shopping list: "I'll use Kafka for messaging, Redis for caching, Elasticsearch for search, DynamoDB for storage, and Kubernetes for orchestration." It sounds impressive. But when the interviewer asks "Why Kafka and not RabbitMQ?" or "Why DynamoDB and not PostgreSQL?" the candidate can't answer — because they chose the technology by name recognition, not by matching requirements to capabilities.
Real Example
A candidate said: "I'll use Elasticsearch for the search feature." The interviewer asked: "The search is just looking up users by exact username. Do you need Elasticsearch for that?" The candidate realized that a simple database index with a WHERE username = ? query would handle this perfectly. Elasticsearch is for full-text search (fuzzy matching, relevance scoring, synonym handling) — not for exact key lookups. Using Elasticsearch for exact username search is like using a chainsaw to cut butter.
How to Fix It
For every technology you name, complete this sentence: "I'm choosing [X] over [Y] because [specific reason]." If you can't complete the sentence, don't name the technology. Say what you need instead: "I need a way to handle 50K messages/sec with replay capability" — and then work out which technology provides that.
"I need a message queue here. Kafka and RabbitMQ are both options. Kafka gives us message replay and higher throughput, while RabbitMQ is simpler to operate. Given our requirement for 50K messages/sec and the need to replay failed messages, Kafka is the better fit." That's a decision. "I'll use Kafka" without that reasoning is just a brand name.
The Mistake
The candidate presents their design as if every choice is perfect and has no downsides. "I'll use DynamoDB — it scales infinitely." They never acknowledge that DynamoDB doesn't support JOINs, has limited query flexibility, costs more than PostgreSQL at moderate scale, and requires careful partition key design to avoid hot spots.
Real Example
A candidate chose an eventually consistent database for an e-commerce inventory system. The interviewer asked: "What happens if two people try to buy the last item at the same time?" Under eventual consistency, both might see the item as available, both might complete the purchase, and the company would oversell. The candidate hadn't thought about this because they presented eventual consistency as purely beneficial without acknowledging its limitations.
How to Fix It
After every technology choice, proactively say: "The trade-off here is..." Every engineering decision has a downside. Caching reduces latency but introduces cache invalidation complexity. Sharding enables horizontal scaling but makes cross-shard queries hard. Microservices improve team independence but add network latency and operational overhead.
Acknowledging trade-offs doesn't make your design weaker — it makes it credible. An engineer who says "the trade-off is X, and here's why it's acceptable in this context" sounds like someone who's built real systems. An engineer who presents everything as perfect sounds like someone who learned from slides.
The Interview Playbook — Minute by Minute
You've learned the framework. You've seen the mistakes. But when you're sitting in an actual interview with a timer ticking, knowing what to do isn't enough — you need to know when to do it. This section gives you a concrete, minute-by-minute plan for a standard 45-minute system design interview.
The biggest reason candidates fail isn't lack of knowledge — it's running out of time because they spent 20 minutes on requirements and had 5 minutes left for the actual design.
What You Do
Ask clarifying questions. Define what's in scope and what's out. Establish the scale. This is not a formality — it's the foundation everything else rests on. A shaky foundation means a shaky design.
Example Phrases to Use
- "Before I start designing, let me make sure I understand the problem. What are the core use cases we need to support?"
- "What's the expected scale? Are we talking thousands of users or hundreds of millions?"
- "Is this read-heavy or write-heavy? That'll drive the database and caching decisions."
- "Should I focus on the feed generation, the upload pipeline, or both?"
- "What are the latency requirements? Sub-100ms for reads?"
What the Interviewer Is Evaluating
Communication and ambiguity management. Can you take a vague problem and turn it into something concrete? Senior engineers don't build what they think is needed — they confirm what's actually needed. Every question you ask signals maturity.
Common Mistakes in This Block
- Spending zero time here — jumping straight to drawing boxes. This is Mistake #1 from Section 11.
- Asking too many questions — spending 10+ minutes on requirements. Keep it to 5-7 questions max. You need enough to estimate, not a full product spec.
- Not writing anything down — always jot the key numbers (users, QPS, storage) where the interviewer can see them. You'll reference these numbers throughout.
What You Do
Take the numbers from Step 1 and do quick math. How many requests per second? How much storage over 5 years? How much bandwidth? The point isn't precision — it's order-of-magnitude thinking.
Example Phrases to Use
- "Let me do some quick math on the scale. 100M daily users, each making about 20 requests — that's roughly 2 billion requests a day, or about 23K QPS."
- "For storage: if each record is about 1KB and we're adding 10M records per day, that's 10GB/day, or about 3.6TB per year."
- "At 23K QPS, a single server handling 1K QPS means we need about 25 application servers behind a load balancer."
What the Interviewer Is Evaluating
Quantitative reasoning. Can you translate requirements into numbers that drive architecture decisions? The math doesn't need to be exact. It needs to be in the right ballpark and it needs to inform your choices. If your math says 500 QPS but you design for 5 million QPS, the interviewer notices.
Common Mistakes in This Block
- Skipping estimation entirely — "I'll just use a scalable architecture." Without numbers, you can't justify any design choice.
- Getting lost in precision — spending 8 minutes calculating exact bytes. Round aggressively: 86,400 seconds/day ≈ 100K. Close enough.
- Not connecting math to decisions — calculating 2K QPS but never saying "...which means a single PostgreSQL instance can handle this without sharding."
What You Do
Draw the big picture. 4-6 boxes, arrows between them, clear labels. This is your high-level architecture. The interviewer should be able to look at your diagram and understand the entire data flow in 10 seconds.
Example Phrases to Use
- "Here's my high-level design. Users hit a load balancer, which routes to the API layer. The API layer talks to our main database for writes and a cache layer for reads."
- "I'm separating the write path from the read path because our estimation showed a 100:1 read-to-write ratio. This lets us scale reads independently."
- "I'll use an async message queue between the upload service and the processing service because transcoding is CPU-heavy and we don't want it blocking the upload response."
What the Interviewer Is Evaluating
Architecture thinking. Can you decompose a system into logical, independent components? Do the arrows make sense? Does the data flow have a clear path from input to output? Most importantly: does your design match the numbers from your estimation?
Common Mistakes in This Block
- Too many boxes — drawing 15 components before explaining any of them. Start with 4-6 and add more during the deep dive.
- No data flow direction — boxes connected by lines but no arrows. The interviewer can't tell which direction data moves.
- Premature technology choices — labeling boxes "Kafka," "Redis," "DynamoDB" before explaining why each component exists.
What You Do
This is where the interview is won or lost. Pick the 2-3 most interesting or challenging components from your high-level design and go deep. Define the API contracts. Choose the database schema. Explain the data flow step by step. Discuss trade-offs.
You have 20 minutes for this — the biggest single block. Use it wisely. Don't spread thin across every component. Go deep on a few rather than shallow on all.
Example Phrases to Use
- "Let me dive deeper into the feed generation service, since that's the most complex part. There are two approaches: fan-out-on-write and fan-out-on-read..."
- "For the database, I'm choosing PostgreSQL over DynamoDB here because we need JOIN queries for the friend-of-friend feature, and the data model is relational."
- "The trade-off with this caching approach is cache invalidation complexity. When a user updates their profile, we need to invalidate every cached feed that includes their posts."
- "I realize this creates a single point of failure. Let me add a replica with automatic failover..."
What the Interviewer Is Evaluating
Technical depth and trade-off analysis. Can you go from "a box labeled Database" to "PostgreSQL with a composite index on (user_id, created_at DESC), read replicas for the feed queries, and a connection pool sized for our 5K concurrent connections"? Can you explain why each choice is better than the alternatives for this specific problem?
How to Choose What to Deep-Dive
Pick the components that are most unique to this problem. A load balancer works the same way in every system — don't deep-dive on that. But the feed ranking algorithm for a social network? The real-time matching algorithm for a ride-sharing app? The consistency model for a collaborative editor? Those are gold. The interviewer is trying to see how you handle the hard parts, not the generic infrastructure.
What You Do
Walk through what happens when things break. This is where you show you've built (or at least thought about) real systems. Production systems crash, networks partition, disks fill up, third-party APIs go down. A design that only works when everything is perfect isn't a design — it's a wish.
Example Phrases to Use
- "What happens if the primary database goes down? I've set up a read replica with automatic failover. There's a brief period of read-only mode, maybe 10-30 seconds, while the replica is promoted."
- "If the cache goes down, all traffic hits the database directly. At 23K QPS, the database can handle about 5K QPS, so we'd need a circuit breaker to shed load gracefully while the cache recovers."
- "For the payment service, I'd add idempotency keys so that if a request is retried due to a timeout, we don't charge the user twice."
What the Interviewer Is Evaluating
Production readiness and operational maturity. Anyone can design a system that works when everything goes right. Senior engineers design systems that degrade gracefully when things go wrong. This is the section that separates "I studied system design" from "I've actually operated systems."
Key Failure Modes to Consider
- Server crashes — Do you have replicas? How long does failover take?
- Network partitions — Can your system keep working if one data center can't talk to another?
- Hot spots — What if one user (a celebrity, a viral post) gets 10,000x the normal traffic?
- Data corruption — What if bad data makes it into the database? Can you detect and recover?
- Thundering herd — What happens when the cache expires and 10,000 requests simultaneously hit the database?
What You Do
Summarize your design in 60 seconds. Restate the key decisions and trade-offs. Mention what you'd do differently with more time. Then ask the interviewer questions — this is your time to show curiosity.
Example Phrases to Use
- "Let me summarize: we have a write path through the API to PostgreSQL, a read path through Redis cache with a 100:1 read-write ratio, and async processing via a message queue for heavy operations like transcoding."
- "The main trade-off I made was choosing eventual consistency for the feed to optimize read latency. If consistency was more critical, I'd use a different approach."
- "Given more time, I'd want to explore the monitoring and alerting strategy, and think more carefully about the data migration plan for when we need to shard."
What the Interviewer Is Evaluating
Self-awareness and growth mindset. Can you identify the weaknesses in your own design? Do you know what you don't know? Saying "I chose X but I recognize that Y is a trade-off I'd want to revisit" is much stronger than pretending your design is perfect.
Closing Questions to Ask
- "What challenges does your team currently face with this kind of system?"
- "Is there a part of the design you'd like me to go deeper on?"
- "How does your team handle [specific technical challenge] in production?"
Hands-On Exercises — Practice the Framework
Reading about system design is like reading about swimming — you understand the concept, but you can't do it until you practice. These five exercises are progressive: each one builds on the skills from the previous one, adding new challenges like fan-out, geospatial indexing, and distributed consistency.
For each exercise: set a 45-minute timer, use a whiteboard or blank paper, and follow the framework from Section 5. Only check the hints and solutions after you've tried it yourself.
Focus areas: Key generation (how do you create unique short codes?), storage estimation (how many TB over 5 years?), caching strategy (what gets cached and for how long?).
Step 1 — Scope: Core features are shorten (write) and redirect (read). Analytics is secondary.
Step 2 — Estimation: 100M/day ≈ 1,160 writes/sec. 100:1 ratio = 116K reads/sec. Storage over 5 years: 182B URLs × ~100 bytes = ~18 TB.
Step 3 — Components: Load balancer → API service → Key generation service → Database (URL mappings) → Cache (hot URLs).
Step 4 — Key decisions: Base62 encoding of an auto-increment ID or a hash function? Base62 gives 7-character keys with 62^7 = 3.5 trillion possible URLs — more than enough. Cache the top 20% of URLs (Pareto principle) in Redis for sub-1ms reads.
Step 5 — Deep dive: What about hash collisions? What about custom short URLs? How do you handle the cache stampede when a viral URL's cache entry expires?
Focus areas: The fan-out problem (push vs. pull model), celebrity user handling, feed ranking, timeline pagination.
The core challenge: fan-out. When a user with 50 million followers tweets, should you immediately write that tweet into 50 million timelines (fan-out-on-write / push model)? Or should each user's timeline be assembled on-the-fly when they open the app (fan-out-on-read / pull model)?
Push model: Fast reads (timeline is pre-built), but writes are expensive. A celebrity's tweet triggers 50M writes. Most timelines are pre-computed, so opening the app is instant.
Pull model: Fast writes (just store the tweet once), but reads are expensive. To show a timeline, you query all 200 followed accounts, sort by time, and return the top N. Slow at scale.
Hybrid approach (what Twitter actually does): Push for normal users (under 10K followers). Pull for celebrities (over 10K followers). When a user opens their timeline, merge the pre-built feed with fresh celebrity tweets. This gives you fast reads for most content and avoids the 50M-write problem for celebrities.
Storage: Timeline cache in Redis (user_id → list of tweet_ids). Each entry is ~8 bytes. 500M users × 800 tweet_ids = ~3.2 TB of cache. Fits in a Redis cluster.
Focus areas: Geospatial indexing (how do you efficiently find "drivers near me"?), real-time location tracking, matching algorithm, handling supply-demand imbalance (surge pricing).
Geospatial indexing: Divide the world into grid cells using geohashing. Each cell is ~1km². Store active drivers in a hash map: geohash → list of driver_ids. To find nearby drivers, query the target cell plus its 8 adjacent cells.
Location updates: 1M active drivers × 1 update every 4 seconds = 250K writes/second. Too many for a relational database. Use an in-memory store (Redis) with geohash keys. Each update: remove driver from old cell, add to new cell.
Matching algorithm: When a rider requests: (1) Find all drivers in nearby cells, (2) Filter by availability, (3) Calculate ETA for each using road distance (not straight-line), (4) Rank by ETA, (5) Send request to top driver, (6) If declined within 10s, move to next.
Scaling across cities: Each city operates independently — a ride in New York doesn't need driver data from London. Partition by city/region. This gives you natural sharding with no cross-shard queries.
Focus areas: Video transcoding pipeline (converting to multiple formats/resolutions), CDN distribution strategy, adaptive bitrate streaming, storage optimization (hot/warm/cold tiers), recommendation feed.
Upload pipeline: Client uploads to the nearest edge server (not the origin). The edge acknowledges the upload quickly, then asynchronously transfers to the origin. This gives the user a fast "upload complete" experience even for large files.
Transcoding: Each video is encoded into multiple resolutions (360p, 480p, 720p, 1080p, 4K) and formats (H.264, VP9, AV1). A 10-minute video generates ~15 output files. 500 hours/minute × 15 outputs = 7,500 encoding jobs per minute. Use a job queue (like SQS) with an auto-scaling worker fleet.
Storage tiering: Hot videos (uploaded in the last 7 days or trending) on SSD-backed storage with CDN caching. Warm videos (viewed in the last 90 days) on HDD-backed storage with on-demand CDN loading. Cold videos (rarely accessed) on archival storage like S3 Glacier.
Streaming: Use adaptive bitrate streaming (HLS/DASH). The video is split into 4-second segments at each quality level. The player requests segments one at a time, switching quality based on measured bandwidth. This eliminates buffering for most viewers.
CDN strategy: Pre-populate CDN edge servers with trending/viral content. For long-tail content, use a tiered CDN: edge → regional cache → origin. 90% of views hit the edge (cache hit), 9% hit regional, 1% goes to origin.
Focus areas: Distributed counting (how do you count requests across 5 data centers?), sliding window vs. token bucket algorithms, consistency vs. availability trade-off, handling clock skew between data centers.
Algorithm choice: The token bucket algorithm is the best fit here. Each user has a bucket with a max capacity (burst limit) and a refill rate (sustained limit). Memory per user: ~16 bytes (token count + last refill timestamp). For 100M users: ~1.6 GB — fits in Redis.
The distributed challenge: With 5 data centers, a user's requests could hit any of them. If each data center counts independently, a user with a 100/min limit could make 500 requests/min (100 per data center). Two approaches:
Approach A — Centralized counter: All data centers check a single Redis instance. Accurate, but adds cross-region latency (50-200ms) per request. Violates the 5ms latency requirement.
Approach B — Local counters with sync: Each data center keeps a local counter and syncs with others periodically (every 5-10 seconds). Divide the limit: each data center gets 100/5 = 20 requests/min locally. Over-limit tolerance is ~5% due to sync delay. This meets both the latency and accuracy requirements.
Edge case — traffic imbalance: If a user mostly hits one data center, the 20/min local limit is too restrictive. Solution: use a "borrowing" mechanism where a data center can request extra quota from others. Or use a weighted split based on historical traffic patterns.
Quick Reference — Cheat Cards
Tear these out (mentally) and keep them next to your whiteboard during practice. Each card distills one critical aspect of the framework into something you can glance at in 5 seconds.
- Understand — Ask questions, define scope
- Estimate — QPS, storage, bandwidth
- Decompose — 4-6 boxes + arrows
- Deep Dive — DB, API, data flow, trade-offs
- Recurse — Split further if still vague
- What are the core use cases?
- How many users / DAU?
- Read-heavy or write-heavy?
- Latency requirements?
- Consistency vs. availability?
- What can we deprioritize?
- 1 day ≈ 100K seconds
- 1 year ≈ 30M seconds
- 1 char = 1 byte (ASCII)
- 1 image ≈ 300 KB
- 1 video minute ≈ 50 MB
- 80/20 rule for cache sizing
- SQL (PostgreSQL): relationships, JOINs, ACID
- NoSQL doc (MongoDB): flexible schema, fast reads
- Wide-column (Cassandra): massive writes, time-series
- Cache (Redis): sub-ms reads, sessions, counters
- Search (Elastic): full-text, fuzzy matching
- □ Single point of failure?
- □ What if the DB goes down?
- □ What if the cache is cold?
- □ Hot spots / thundering herd?
- □ Network partition handling?
- □ Data corruption / recovery?
- "Let me clarify the requirements first..."
- "The math tells us we need..."
- "I'm choosing X over Y because..."
- "The trade-off here is..."
- "If this component fails, here's what happens..."
- "Given more time, I would also explore..."
Connected Topics — What to Study Next
The framework you've learned is the skeleton of system design. To fill in the muscles and organs, you need deep knowledge of the building blocks: databases, caching, networking, consistency models, and more. Each of these foundation topics directly maps to one or more steps in the framework.
Glossary
Every technical term used on this page, defined in plain English. These also power the tooltips you see throughout the text.