HLD Foundations

Async Patterns

Why your code shouldn't stand around waiting β€” and every technique engineers use to keep things moving while the slow stuff catches up.

30+ Q&As 5 Bug Studies 8 Pitfalls 20+ SVG Diagrams 5 Exercises
Section 1

TL;DR

  • What "asynchronous" actually means in plain English β€” and the one physical reason it has to exist
  • The four main async patterns in order of evolution: callbacks, promises/futures, async-await, and reactive streams β€” and what problem each one solved that the previous one couldn't
  • When to use a callback vs a promise vs async/await vs a message queue β€” and how to tell them apart at a glance
  • The async messaging patterns that underpin real distributed systems: async messaging, sagas, and event-driven choreography
  • How the conceptual shift from "wait for the answer" to "tell me when you have it" changes everything about how you design systems at scale
  • The four things you gain from async (responsiveness, throughput, resilience, scalability) and the two trade-offs you always make (complexity, harder debugging)

Async patterns are the art of starting a slow job without freezing everything else β€” and every technique from callbacks to message queues is just a different answer to the same question: "What should my code do while it's waiting?"

The one-liner: Synchronous code says "I'll wait here until you're done." Asynchronous code says "Start that, tell me when it's ready, and I'll go do something useful in the meantime." Everything on this page β€” callbacks, promises, async/await, message queues, sagas β€” is a variation on that second sentence.

What: Asynchronous patternsA family of programming and system-design techniques that allow a program or service to initiate a slow operation (a network call, a disk read, a database query) and continue doing other work instead of blocking the current thread waiting for the result. are a family of techniques for decoupling "I need this result" from "I'll wait here until I have it." Instead of blocking β€” stalling everything while a slow operation runs β€” async code hands off the work, registers what to do when it finishes, and gets on with life. The four main patterns (callback β†’ promise β†’ async/await β†’ reactive streams) solve the same fundamental problem with increasing elegance.

When: Use async patterns whenever your code touches anything that crosses a process boundary β€” a network request, a database query, a file read, a message queue. Synchronous code is fine for pure in-memory computation. The moment you're waiting for something external, sync code wastes threads, burns memory, and caps your throughput. For distributed systems: use async messaging (queues, pub/sub) when you need to decouple services across machine boundaries. Use the saga patternA way to manage a long-running business transaction across multiple services without a global lock. Each step publishes an event or sends a command. If a step fails, compensating transactions undo the prior steps. Two variants: choreography (each service reacts to events) and orchestration (a central saga coordinator drives the steps). when a multi-step workflow spans multiple services and you need coordinated rollback if any step fails.

The evolution in one sentence each: Callbacks β€” "when done, call this function." Promises β€” "give me an object I can attach handlers to later." Async/await β€” "write it like sync code but don't block the thread." Reactive streams β€” "treat a sequence of future values like a collection you can filter, map, and merge."

The core trade-off: You gain responsiveness (the caller isn't frozen), throughput (one thread handles thousands of in-flight requests), resilience (failures are isolated), and scalability (async messaging lets services scale independently). You trade away readability (execution is no longer top-to-bottom) and debuggability (stack traces fragment; errors surface in unexpected places).

Quick Example (JavaScript async/await):

// SYNC β€” blocks the thread until the DB responds const user = db.findUser(userId); // thread frozen here const orders = db.findOrders(user.id); // thread frozen again renderProfile(user, orders); // ASYNC (async/await) β€” thread is free while DB is thinking const user = await db.findUser(userId); // "start this; resume me when done" const orders = await db.findOrders(user.id); renderProfile(user, orders); // Reads like sync. Works like async. The thread handles other requests in between.
Async patterns let code hand off slow operations and continue working instead of blocking. The family runs from raw callbacks β†’ promises β†’ async/await β†’ reactive streams for single-process code, and from fire-and-forget messaging β†’ pub/sub β†’ sagas for distributed systems. You gain responsiveness, throughput, and scalability. You pay with added complexity and fragmented stack traces.
Section 2

Why You Should Care β€” The Problem It Solves

Let's start with a story. It's Black Friday. Millions of people are hitting your checkout endpoint at once. Your synchronous code is about to teach you an expensive lesson.

The Synchronous Checkout That Fell Over

Imagine a simple checkout flow: a user clicks "Pay." Your server receives the request and starts doing work β€” charge the card (200ms), update inventory in the database (80ms), send a confirmation email via a third-party API (350ms), notify the warehouse system (120ms). Total: about 750ms of waiting.

That's fine with one user. But here's the problem: while your server thread is waiting for the payment processor to reply, that thread is completely frozen. It can't handle another request. It's just sitting there, doing nothing, burning memory, occupying a slot in your thread pool.

What happens under load: Your server has 200 threads. Each thread gets stuck for 750ms waiting for slow I/O. That means at peak, you can handle 200 Γ· 0.75s β‰ˆ 267 requests per second. On Black Friday you're getting 2,000 requests per second. Your thread pool exhausts instantly. New requests queue up, then time out with 503 errors. The checkout page goes down β€” not because your CPU is overwhelmed, but because all your threads are waiting, not working. Synchronous threads under load: threads mostly blocked on I/O, tiny CPU work slices, thread pool exhausts quickly SYNCHRONOUS β€” each thread blocks on I/O, pool exhausts fast threads T1 waiting for payment API (200ms) + DB (80ms) + email (350ms) + warehouse (120ms) T2 T3 T4 ⚠ Thread pool exhausted β€” new requests queue β†’ timeout β†’ 503 req #5 req #6 req #7 … CPU work blocked / waiting

Look at those thread bars. The red sections β€” where the thread is frozen waiting for an external system β€” dwarf the green CPU-work sections. Your server hardware isn't the bottleneck. The problem is that synchronous code ties one OS thread to one in-flight request, and threads are expensive (each consumes roughly 1–8 MB of stack memory depending on the OS β€” Windows defaults to ~1 MB, Linux to ~8 MB). You can't just keep adding threads forever.

The Async Checkout That Scales

Now imagine the same checkout flow, but async. The server receives the request and starts the payment charge. But instead of waiting around, it tells the runtime: "when the payment API responds, resume here." The thread is immediately released back to the pool β€” free to handle another incoming request right now. When the payment API replies (200ms later), the runtime picks up any available thread and resumes exactly where it left off. This is called non-blocking I/OA style of I/O where a thread initiates an operation (file read, network call) and immediately returns. The OS notifies the program when the result is ready, rather than keeping the thread frozen waiting. The same thread can handle other work in the gap..

Async non-blocking I/O: one thread initiates many requests, handles responses as they arrive β€” high throughput ASYNC β€” one thread, many in-flight requests, never idle Event Loop (single thread) initiates I/O β†’ resumes on callback/await never blocked Payment API initiate (non-blocking) resume on response Database Email API Meanwhile… same thread accepts req #2, #3, #4… while req #1 waits for I/O responses zero idle time Result: one thread handles thousands of concurrent requests β€” throughput scales with I/O count, not thread count

The async version uses the same hardware but handles far more concurrent requests β€” because the thread is never just sitting there waiting. It initiates a request, hands it off to the OS, and immediately starts handling request #2. When the OS signals "I/O is done," the thread resumes the first request's continuation. This is why Node.js can handle tens of thousands of concurrent connections on a single thread β€” and why async/await in C# or Python can multiply a server's throughput dramatically with no extra hardware.

The real insight: Most server work isn't CPU-intensive β€” it's waiting-intensive. Your code spends 95% of its time blocked on the network or disk. Synchronous code uses one thread per wait. Async code reuses the same thread for all the waits happening simultaneously. That difference in thread utilization is the entire reason async patterns exist.

It's Not Just About Single Servers β€” It's About Entire Systems

Zoom out further. In a microservices architecture, "async" takes on a different shape. A checkout service doesn't just need to not block its own thread β€” it needs to not block waiting for downstream services to respond. If the email service is slow, a synchronous checkout waits for the email service. If the inventory service goes down, checkout fails entirely β€” even though the payment succeeded.

This is where async messagingA communication style where a service sends a message to a queue or topic and immediately continues without waiting for the recipient to process it. The recipient picks up and processes the message independently, at its own pace. Examples: RabbitMQ, SQS, Kafka. enters. Instead of the checkout service calling the email service directly, it drops a message on a queue and moves on. The email service picks it up whenever it's ready. Now a slow or downed email service can't break checkout. That's the async mindset applied to entire systems, not just individual threads.

Synchronous code wastes threads by freezing them while waiting for I/O. Under load, thread pools exhaust and services collapse β€” not from CPU pressure, but from waiting. Async patterns free threads immediately and resume them when results arrive, multiplying throughput from the same hardware. At the system level, async messaging decouples services so a slow downstream can't take down an upstream.
Section 3

Real-World Analogies

Before we look at any code, let's wire in the right mental picture. These analogies map directly to the four patterns you'll learn β€” once you feel them in real life, the code just makes sense.

You walk into a busy restaurant at lunchtime. There are no empty tables. You have two choices about how the host handles this.

The synchronous version: The host makes you stand at the podium. You can't move, you can't sit, you can't even look at your phone β€” you just wait there, blocking the entrance, until a table opens. The host can't greet the next customer because you're in the way. This is synchronous code: the caller freezes and nothing else moves until the operation completes.

The async version: The host hands you a buzzer β€” a small plastic pager β€” and says "We'll buzz you when your table is ready. Go wait at the bar, browse your phone, get a drink." You walk away. The host immediately greets the next customer. When a table opens up, your buzzer vibrates. You stop whatever you were doing and go to your table. The host's job (and your time) wasn't blocked waiting for a table. That buzzer is the async "token" β€” it's the promise that the result will arrive later.

Now map this to the four patterns. The host saying "I'll call you" is a callback. The buzzer object itself β€” something you can hold and attach logic to ("when it vibrates, go to table 12") β€” is a Promise. The phrase "wait for the buzz but keep doing things in between" is what async/await gives you syntactically. And if the restaurant were streaming you a live feed of current table availability so you could make decisions reactively β€” "table 7 opened but it's only 2-seater, skip; table 12 opened, it's a 4-seater, accept" β€” that's a reactive stream.

Restaurant momentWhat it representsAsync pattern
Standing frozen at the podiumThread blocked waiting for I/OSynchronous (the problem)
Host says "I'll call you when ready"Register a function to run on completionCallback
The buzzer you hold in your handAn object representing a future resultPromise / Future
"Go wait at the bar; I'll buzz you"Start the job, continue other work, resume on completionasync / await
Live feed of table availability changesA continuous stream of values you can react to over timeReactive stream / Observable
Buzzer going off while you're mid-drinkThe callback/continuation fires at an unexpected momentAsync continuation
Restaurant buzzer analogy: async patterns mapped to seating process with timeline The Restaurant Buzzer β€” Async Pattern Timeline time β†’ 1 Customer arrives 2 Buzzer handed = Promise created Customer at bar (free!) = thread handles other requests Host seats next customer = I/O in flight, thread is free 3 Table opens = I/O completes 4 Buzzer vibrates! = callback / await resumes 5 Customer seated = result consumed SYNC alternative: customer stands frozen at podium from step 1 β†’ step 5 Host can't greet anyone else. Customer can't do anything. One person's wait time = entire system's blocked time.

The timeline above shows why async wins. Between steps 2 and 4, the synchronous customer is just frozen at the podium β€” doing nothing, blocking everyone. The async customer is at the bar being productive (or at least out of the way). From the system's point of view, the host (thread) is serving other customers the entire time. When the table (I/O) is ready, the buzzer fires and the customer resumes exactly where they need to be. No thread wasted. No blocking. Same result.

The "aha" moment: The buzzer is a physical Promise. It's an object you hold that represents a future result. You can attach behavior to it ("when it buzzes, go to table 12") β€” that's `.then()` on a Promise, or the code after `await`. The result isn't here yet, but the token for it is β€” and you can plan around it immediately.
The restaurant buzzer maps async patterns directly to lived experience: the buzzer = Promise (a token for a future result), walking to the bar = the thread handling other work, the buzz = callback/await resuming. Amazon tracking maps to Promise chaining and push-vs-pull. The doctor's text maps to async messaging between services. The water tap maps to reactive backpressure. Internalize these images and the code mechanics will feel obvious.
Section 4

The Mental Model β€” Sync vs Async Timeline

Here's the clearest way to understand all four patterns at once: as a timeline of what the calling thread is doing while it waits for a slow operation. Look at the white/colored space in each row β€” that's where the real difference lives.

Sync vs Callback vs Promise vs Async/Await: timeline of what the calling thread does during a slow I/O operation What is the calling thread doing while the slow I/O runs? β†’ time Synchronous call() β›” thread blocked β€” frozen, can't do anything result continue… The thread is frozen for ~700ms. Zero other work possible. Callback doThing(cb) βœ“ thread FREE β€” handles other requests cb(result) continue in cb Free during I/O. BUT: "continue" code is buried inside the callback β€” nesting gets ugly. Promise p = doThing() .then(fn) βœ“ thread FREE resolves fn(result) fires Cleaner: chaining replaces nesting. BUT: code still split across .then() handlers, not linear. async/await await doThing() βœ“ thread FREE (same mechanism as Promise) resumes next line β†’ Reads like sync. Behaves like Promise. Code is linear β€” no nesting, no split handlers. Best of all worlds. CPU work / user code Thread blocked (sync) Thread free (async) All three async variants (callback, promise, async/await) have the same green "free" window β€” they differ only in HOW you write the continuation code.

The most important thing that diagram shows: callback, promise, and async/await all have the same green "thread free" window. They are all built on the same underlying mechanism β€” the event loop or scheduler detects I/O completion and resumes the continuation. The difference between them is entirely about how you write the code that runs after the result arrives, not about any difference in performance or thread behavior.

Synchronous β€” The Baseline (and Why It's Fine for CPU Work)

Synchronous code isn't bad β€” it's the wrong tool when you're waiting for I/O. For pure computation (sorting an array, calculating a hash, validating input), synchronous code is correct and simpler. There's no I/O to wait for β€” the CPU is busy the whole time. Async would add overhead with no benefit. The rule: go async when you're waiting; stay sync when you're computing.

Callbacks β€” The Original Solution (and Its Famous Flaw)

The first answer to "what should my code do while waiting" was: register a function to call when the result arrives. That function is a callbackA function you pass as an argument to another function, to be called later when an operation completes. The most primitive form of async programming. Example: setTimeout(fn, 1000) β€” fn is a callback that fires 1 second later.. It works great for one level deep. The problem emerges when callbacks call other things that need callbacks β€” you end up with functions inside functions inside functions, indented halfway across your screen. Developers call this callback hellThe deeply nested, pyramid-shaped code that results from chaining multiple async callbacks together. Error handling is scattered, code is hard to read, and the logic flow is non-obvious. Solved by Promises and async/await..

Promises β€” Flattening the Pyramid

A PromiseAn object that represents the eventual completion (or failure) of an asynchronous operation and its resulting value. A Promise is in one of three states: pending (operation not yet complete), fulfilled (completed successfully with a value), or rejected (failed with an error). is an object that represents a future value. Instead of passing a callback into the function, the function gives you back a Promise β€” and you attach your "what to do next" handler to the Promise object with .then(). The key gain: you can chain .then() calls instead of nesting functions. Flat chains replace deep pyramids. Error handling centralizes in one .catch() at the end of the chain.

Async/Await β€” Promises with a Human Face

async/await doesn't introduce new runtime behavior β€” it's a syntactic sugarLanguage syntax that makes code easier to read and write without introducing new capabilities. async/await is syntactic sugar over Promises: the compiler/runtime transforms your await expressions into Promise .then() chains automatically. layer over Promises. When you write const result = await doThing(), the compiler transforms it into a Promise chain internally. But from your perspective, the code reads line-by-line like synchronous code β€” no .then() nesting, no split handlers. This is why async/await became the dominant style: same async power, readable as sync.

Reactive Streams β€” Async Over Time, Not Just Once

Callbacks, Promises, and async/await all handle a single future value: one request, one result, done. But what if the "result" is actually a continuous stream of values β€” user input events, stock price ticks, a Kafka topic with millions of records? A reactive streamA sequence of values that arrive asynchronously over time, processed using functional operators (map, filter, merge, debounce, etc.). Implementations include RxJS (JavaScript), Reactor (Java/Kotlin), RxDart (Dart), and .NET's System.Reactive. treats that sequence like a lazy collection you can transform, filter, merge, and throttle β€” all while remaining non-blocking. The key extra concept reactive adds is backpressure: the consumer can signal the producer to slow down, preventing buffer overflow.

Synchronous code blocks the thread for the full duration of I/O β€” fine for pure computation, catastrophic under I/O-heavy load. Callbacks free the thread but produce deeply nested code. Promises flatten the nesting into chainable handlers. Async/await adds syntactic sugar so the code reads linearly while behaving asynchronously underneath. Reactive streams extend the model to sequences of values over time and add backpressure for flow control. All three async variants share the same underlying mechanism β€” they differ only in syntax.
Section 5

Minimal Working Example β€” Three Ways to Fetch a User Profile

Same problem, three solutions. We need to fetch a user's profile from a remote API, then use the result. Watch how the code evolves β€” the observable runtime behavior is identical in all three; only the code shape changes.

The Scenario

You're building a dashboard. When the page loads, you need to call GET /api/users/:id and render the returned profile data. The network call takes ~200ms. You want the browser (or server) to stay responsive while waiting.

// ── Callback Style ────────────────────────────────────────── // The function takes the callback as its last argument. // When the data arrives, Node/browser calls cb(error, data). function fetchUser(userId, callback) { fetch(`/api/users/${userId}`) .then(res => res.json()) .then(data => callback(null, data)) // success: (null, data) .catch(err => callback(err, null)); // failure: (err, null) } // Caller β€” registers what to do when fetchUser finishes fetchUser(42, function onUserFetched(err, user) { if (err) { console.error("Failed to load user:", err); return; } // Now fetch their orders too β€” but look what happens: fetchOrders(user.id, function onOrdersFetched(err, orders) { if (err) { console.error("Failed to load orders:", err); return; } // One more level β€” render renderDashboard(user, orders, function onRenderDone(err) { if (err) console.error("Render failed:", err); // ← we're now 3 levels deep. More steps = more nesting. // This is "callback hell" β€” the pyramid of doom. }); }); }); // ── Promise Style ────────────────────────────────────────── // Functions return a Promise instead of accepting a callback. // Chain .then() calls to sequence steps flatly. function fetchUser(userId) { return fetch(`/api/users/${userId}`).then(res => res.json()); } function fetchOrders(userId) { return fetch(`/api/orders?userId=${userId}`).then(res => res.json()); } // Caller β€” flat chain instead of nested callbacks let savedUser; fetchUser(42) .then(user => { savedUser = user; // save for later steps in the chain return fetchOrders(user.id); // return next promise to chain it }) .then(orders => { return renderDashboard(savedUser, orders); }) .then(() => { console.log("Dashboard rendered"); }) .catch(err => { // ONE error handler for all three async steps above β€” much cleaner console.error("Dashboard failed:", err); }); // Still a shape problem: we had to store `savedUser` in the outer scope // because .then() callbacks don't share scope. async/await solves this. // ── async/await Style ────────────────────────────────────── // Reads like synchronous code. Compiled into Promises under the hood. // Variables stay in scope naturally. No nesting. No saved references. async function loadDashboard(userId) { try { const user = await fetchUser(userId); // wait, then continue const orders = await fetchOrders(user.id); // wait, then continue await renderDashboard(user, orders); // wait, then continue console.log("Dashboard rendered"); } catch (err) { // All three awaits share this single try/catch β€” same as Promise .catch() console.error("Dashboard failed:", err); } } // Call it β€” async functions always return a Promise loadDashboard(42); // ── Bonus: parallel awaits ──────────────────────────────── // If user and orders are independent, fetch them in parallel: async function loadDashboardFast(userId) { const [user, orders] = await Promise.all([ fetchUser(userId), fetchOrders(userId) // starts at same time as fetchUser β€” saves ~200ms ]); await renderDashboard(user, orders); } Execution flow: callback (nested), promise (chained), async/await (linear) β€” same runtime behavior, different code shape Code Shape Comparison β€” same runtime result, three different structures Callback: fetchUser(cb1) inside cb1: fetchOrders(cb2) inside cb2: render(cb3) inside cb3: … ← pyramid πŸ”οΈ Promise: fetchUser() .then(fn1) fetchOrders() .then(fn2) .then(fn3) .catch() ← flat βœ“ async/await: const user = await fetchUser(id); const orders = await fetchOrders(user.id); await render(user, orders); ← linear βœ“βœ“ All three: same thread behavior, same network calls, same timing β€” only the code structure differs

Code Walkthrough β€” What Each Piece Does

The callback version passes a function (onUserFetched) into fetchUser. When the fetch completes, fetchUser calls that function with (error, data). The problem is that every next step must live inside the previous callback. Three sequential async steps means three levels of nesting β€” and each level needs its own error check. With ten steps, the code is unreadable.

The Promise version returns a Promise object from each async function. You attach .then(handler) to describe what to do when it resolves. Chains stay flat because each .then() can return another Promise, which the chain automatically waits for. One .catch() at the end covers all failure modes. The remaining awkwardness: variables from one .then() aren't automatically visible in later ones β€” you sometimes need a shared outer variable, as shown with savedUser.

The async/await version is the same Promise chain, but the compiler writes the .then() plumbing for you. Each await expression suspends the async function (freeing the thread) and resumes it when the Promise resolves. Variables declared before one await are naturally visible after the next await β€” no scope gymnastics. One try/catch wraps everything. It reads exactly like synchronous code while running exactly like a Promise chain. The bonus: Promise.all() lets you run independent operations in parallel β€” the "fast" version in the third tab saves ~200ms by fetching user and orders simultaneously.

Rule of thumb: Use callbacks only when working with older APIs that require them (legacy Node.js, some browser APIs). Use Promises when you need fine-grained chaining control or when library APIs return Promises. Use async/await everywhere else β€” it's the most readable and the standard style in modern JavaScript, Python, Kotlin, Swift, and C#. Three patterns, one problem, identical runtime behavior. Callbacks pass a function to call later β€” simple but leads to deeply nested "pyramid" code. Promises return an object representing a future value β€” chains replace nesting, one .catch() handles all errors. Async/await is syntactic sugar over Promises that reads like synchronous code. Use async/await by default; reach for Promise.all() when operations are independent and can run in parallel.
Section 6

Junior vs Senior β€” How They Think About Async

Problem Statement

You're building a REST API endpoint: GET /dashboard/:userId. It needs to fetch the user's profile, their recent orders, and their notification count β€” three separate database queries. The endpoint must respond as quickly as possible, and your team expects production traffic of ~5,000 requests per second.

How does a junior approach this? How does a senior think about it differently? And where does async fit into the gap?

How a Junior Thinks

A junior engineer who's new to async typically thinks in terms of "steps that happen in order." They write the code the way they'd describe the process out loud: "First get the user, then get the orders, then get the notifications, then respond." Synchronous, sequential, readable. It works perfectly in development β€” the test database is fast and there's no load. The problems only show up in production.

// Junior: sequential sync-style async β€” correct but slow app.get('/dashboard/:userId', async (req, res) => { const userId = req.params.userId; // Each await blocks until the DB responds before starting the next const user = await db.getUser(userId); // ~50ms const orders = await db.getOrders(userId); // ~60ms const notifications = await db.getNotifications(userId); // ~40ms // Total: 150ms sequential β€” but user and orders are INDEPENDENT. // They don't need each other. We're waiting for no reason. res.json({ user, orders, notifications }); });

Problems

Problem 1: Unnecessary Sequential Waiting

All three queries are independent β€” none of them needs the result of the others. But the code waits for each one to finish before starting the next. At 5,000 req/s, those extra 100ms of needless waiting per request add up to enormous throughput waste.

Problem 2: No Timeout or Cancellation

If the notifications service hangs for 30 seconds, this endpoint hangs for 30 seconds. There's no timeout, no cancellation, and no fallback. The thread is held open indefinitely. Under load, this causes thread pool exhaustion.

Problem 3: All-or-Nothing Response

If getNotifications() throws, the entire endpoint fails with a 500. But should a missing notification count really break the whole dashboard? A senior would degrade gracefully: return the user and orders even if notifications fail.

The junior trap: Writing await before every async call feels like "I'm doing async properly" β€” and it is asynchronous. But three sequential awaits for independent operations is the async equivalent of making three synchronous calls. The thread isn't blocked (good), but the latency still adds up sequentially (bad). Async isn't just about not blocking β€” it's about using the free time wisely.

How a Senior Thinks

A senior engineer sees this problem as a dependency graph, not a sequential list of steps. They ask: "Which operations depend on each other? Which are truly independent?" Independent operations run in parallel. Dependent operations run in sequence. They also think about failure modes upfront: "What happens if one of these fails? Should the whole response fail, or should I degrade gracefully?"

// Senior: parallel fetches + graceful degradation + timeout app.get('/dashboard/:userId', async (req, res) => { const userId = req.params.userId; const TIMEOUT_MS = 3000; // 1. Identify truly independent operations β†’ run in parallel // Promise.allSettled: runs all, resolves when all are DONE // (even if some fail β€” unlike Promise.all which rejects on first failure) const [userResult, ordersResult, notifResult] = await Promise.allSettled([ withTimeout(db.getUser(userId), TIMEOUT_MS), withTimeout(db.getOrders(userId), TIMEOUT_MS), withTimeout(db.getNotifications(userId), TIMEOUT_MS), ]); // Network time = max(~50ms, ~60ms, ~40ms) β‰ˆ 60ms ← not 150ms // 2. Fail fast on critical data; degrade gracefully on non-critical if (userResult.status === 'rejected') { return res.status(500).json({ error: 'User not found' }); } const user = userResult.value; const orders = ordersResult.status === 'fulfilled' ? ordersResult.value : []; const notifications = notifResult.status === 'fulfilled' ? notifResult.value : 0; // Orders and notifications degrade to empty/zero if their queries fail // User is critical β€” we do fail on that one. res.json({ user, orders, notifications }); }); // 3. Utility: wrap any promise with a timeout function withTimeout(promise, ms) { const timeout = new Promise((_, reject) => setTimeout(() => reject(new Error(`Timed out after ${ms}ms`)), ms) ); return Promise.race([promise, timeout]); // Promise.race resolves/rejects with whichever Promise settles first }

Design Decisions

Promise.allSettled vs Promise.all

Promise.all rejects as soon as any Promise rejects β€” useful when all results are required. Promise.allSettled waits for all to complete regardless of failure and gives you each result's status individually. Use allSettled when partial success is acceptable (dashboard); use all when all-or-nothing is the right behavior (a checkout that requires both payment and inventory).

Promise.race for Timeouts

Promise.race resolves/rejects with the first Promise to settle. By racing your actual query against a timed-out-rejection Promise, you get automatic deadline enforcement. If the DB takes longer than 3 seconds, the race resolves with the timeout error, freeing the request slot. This is the foundation of circuit-breaker patterns in production systems.

Dependency Graph Thinking

Before writing any async code, draw the dependency graph. If operation B needs the output of operation A, they must be sequential. If A and B are independent, they should run in parallel. Getting this analysis right is the difference between 150ms and 60ms latency for the example above β€” and at 5,000 req/s, that's hundreds of hours of cumulative user wait time per day.

Bottom Line

Dimension Junior (sequential awaits) Senior (parallel + resilient)
Latency Sum of all query times (~150ms) Max of all query times (~60ms)
One query fails Entire endpoint fails (500) Degrades gracefully (partial data)
One query hangs Endpoint hangs indefinitely Times out at 3s, frees the slot
Thread usage Free during waits (good) but waits are serial (bad) Free during waits, waits are parallel (great)
Code reads as Simple top-to-bottom, easy to follow Slightly more complex but production-safe
The senior's mantra: "Use await when operations are dependent. Use Promise.all / Promise.allSettled when they're independent. Always set a timeout. Always decide: is this a critical failure or a graceful degradation?"

Ready to go deeper? Sections 7–11 cover async at the distributed systems level β€” message queues, sagas, event-driven choreography, and how these same async principles scale from one server to hundreds of microservices.

A junior writes sequential awaits β€” correct but unnecessarily slow for independent operations. A senior maps dependencies first, runs independent operations with Promise.all/allSettled, adds timeouts via Promise.race, and designs graceful degradation so partial failures don't become total failures. The gap isn't about knowing async syntax β€” it's about modeling the dependency graph before writing a single line.

Evolution β€” How Async Programming Got Here

Five eras from hand-rolled state machines to structured concurrency β€” each one fixing the exact flaw the previous era created.

Async programming didn't arrive as a finished idea. It was discovered, one painful bug at a time. Each era below was triggered by a specific real-world crisis β€” usually a server that couldn't scale, a codebase no one could read, or a bug no one could track down. Once you see the chain of cause-and-effect, every modern async feature starts to feel inevitable rather than arbitrary.

Async evolution: cooperative multitasking (1960s) β†’ thread-per-request (1990s) β†’ Node.js event loop (2009) β†’ Promises/async-await (2012-2015) β†’ Reactive streams + structured concurrency (2018+) Era 1 1960s–70s Cooperative multitasking Era 2 1990s–2000s Thread-per- request Era 3 2009 Node.js + Event Loop Era 4 2012–2015 Promises + async/await Era 5 2018+ Reactive + Structured Concurrency Each era solved the previous era's biggest pain point β†’ and introduced the next one

Let's walk each era β€” the problem it faced, the solution it invented, and the new flaw that solution exposed.

The problem: Early computers were expensive and couldn't be left idle. If a program needed to read from a tape drive (which took seconds), burning CPU cycles waiting was pure waste. Engineers wanted the CPU to stay useful while I/O was in flight.

The solution: Programs voluntarily yielded control β€” cooperative multitasking. When a program hit an I/O operation, it would save its own state (what variables it had, where to return to), tell the scheduler "I'm waiting, run someone else," and wait to be woken up when the I/O completed. This was programmed by hand β€” developers maintained explicit state machines that tracked exactly where in the program's execution they were.

Why this worked: CPU stayed busy. Multiple I/O operations could be in flight simultaneously. For the hardware of the time, this was impressively efficient.

The flaw it left behind: "Cooperative" means the program has to be nice and yield. If one program went into an infinite loop or forgot to yield β€” intentionally or by bug β€” it froze the entire system. Nobody else got CPU time. The system was as reliable as its least-well-behaved program. That was untenable for multi-user systems.

The solution to Era 1: Operating systems added preemptive multitasking β€” the OS itself forcibly context-switches programs on a timer, so no one program can hog the CPU. And for web servers, the popular model became thread-per-request: each incoming HTTP request gets its own OS thread. Apache popularized this. Java Servlets were built on it. It worked beautifully at small scale.

The C10K problem (1999): As the web grew, engineers tried to push servers to handle 10,000 simultaneous connections β€” "C10K." Thread-per-request collapsed. Each OS thread reserves stack memory (typically around 1 MB by default on Windows, 8 MB virtual on Linux β€” paged in lazily as the stack grows), plus OS scheduler overhead. Ten thousand threads meant gigabytes of stack space reserved just for threads that were mostly sleeping, waiting for network I/O. The server would thrash on context switching before running out of memory. The thread was the unit of concurrency, and threads were too expensive.

The new insight: Most of those threads weren't doing anything β€” they were waiting. Waiting for a database query to return. Waiting for the client to send the next byte. The CPU was idle, but the thread existed and consumed memory anyway. The fix had to decouple "a unit of work" from "an OS thread."

Ryan Dahl's insight: In 2009, Ryan Dahl launched Node.js with a radical design β€” a JavaScript runtime built on a single-threaded, non-blocking event loop. The core idea: instead of blocking a thread on every I/O call, all I/O operations are non-blocking by default. You initiate an operation and pass a callback function; when the OS signals completion, the event loop calls your callback. One thread handles thousands of in-flight requests because it never sleeps waiting for any of them.

libuv is the engine underneath Node.js that implements this. It wraps the OS's asynchronous I/O APIs (epoll on Linux, kqueue on macOS, IOCP on Windows) and exposes them through a unified event loop. When Node.js code calls fs.readFile(), libuv registers the request with the OS and immediately gives control back to the event loop. The OS does the actual reading (using DMA, direct memory access β€” hardware copies data without burning CPU cycles), then signals libuv, which queues the callback. The JavaScript thread was never blocked.

The flaw it created β€” callback hell: With everything async, all your "what to do next" code had to live inside callbacks. Callbacks called functions that took more callbacks. Three async steps deep and your code looked like this:

readFile(path, function(err, data) { // level 1 parseJSON(data, function(err, parsed) { // level 2 saveToDb(parsed, function(err, record) { // level 3 sendConfirmation(record, function(err) {// level 4 β€” pyramid of doom if (err) { // where does this error go? who handles it? // error handling is now scattered across 4 levels } }); }); }); });

Error handling was particularly painful β€” each level needed its own if (err) check. Forgetting one was a latent bug. The flow of the program was impossible to follow visually. "Callback hell" became the term for this problem, and fixing it was the entire motivation for Era 4.

The fix for callback hell: Promises (sometimes called Futures) represent a value that doesn't exist yet. Instead of passing a callback into a function, the function returns a Promise object. You can chain .then() handlers onto it β€” and crucially, you can chain them flat instead of nested. Three sequential async operations became three .then() calls on one line, not three levels of nesting.

Key milestones: ECMAScript 6 (2015) standardized Promises in JavaScript. Java 8 (2014) added CompletableFuture with a similar composable model. C# 5.0 (2012) introduced async/await β€” which went further by making Promises feel like synchronous code. Python 3.4 (2014) added asyncio. The language ecosystem converged on the same idea almost simultaneously, because the problem (callback hell) was universal.

async/await specifically: The C# team realized that even flat .then() chains were mentally harder than linear code. async/await is a compiler transformation: you write code that looks synchronous, and the compiler rewrites it into a state machine that suspends and resumes at each await. The thread is free between suspension points β€” but your code reads top-to-bottom. This was the biggest ergonomic leap in async programming history.

The remaining flaw: async/await is great for a single value arriving in the future. It's awkward for sequences β€” a stream of WebSocket messages, a Kafka topic, a sensor feed sending a hundred events per second. That's where Era 5 stepped in.

Reactive streams: Libraries like RxJS, Project Reactor (Java/Kotlin), and Python's asyncio async generators extended the async model to sequences. Instead of awaiting a single future value, you subscribe to an Observable β€” a stream of values that arrive over time. Each value passes through a pipeline of operators (map, filter, debounce, merge) before reaching your handler. The key addition: backpressure, the ability for a slow consumer to tell a fast producer "slow down." Without backpressure, a fast producer fills memory buffers and eventually crashes the consumer.

Kotlin coroutines (2018): Kotlin took a different approach. Instead of Promises or Observables, it added coroutines β€” lightweight threads managed by the Kotlin runtime, not the OS. A coroutine can suspend (like await) without blocking a real OS thread, can be created by the millions (unlike OS threads), and uses structured concurrency β€” meaning every coroutine has a defined scope, and when that scope ends, all child coroutines are cancelled automatically. No orphaned async work.

Structured concurrency β€” the big idea: All prior async models suffered from "fire and forget" leaks: async work was launched and if the parent context ended (a request completed, a test finished), the async work kept running in the background, using resources, potentially writing to stale state. Structured concurrency makes async work hierarchical: a child task cannot outlive its parent. This makes cancellation, error propagation, and resource cleanup finally predictable. Java 21 (2023) added structured concurrency as a preview feature. Swift added it in Swift 5.5.

EraYearSolvedIntroduced
Cooperative multitasking1960s–70sCPU idle on I/OProgram must yield β€” one bad actor freezes all
Thread-per-request1990s–2000sBad-actor problem (preemption)Memory & scheduler explosion at C10K scale
Event loop (Node.js)2009C10K β€” threads too expensiveCallback hell β€” code unreadable, errors scattered
Promises / async/await2012–2015Callback hell β€” flat chains, linear syntaxAwkward for streams; orphaned async work leaks
Reactive + structured concurrency2018+Streams, backpressure, lifecycle leaksSteeper learning curve; operator overload
Async programming evolved through five eras, each solving the previous era's dominant pain point. Cooperative multitasking solved CPU idle time but required programs to be well-behaved. Thread-per-request solved that but collapsed at C10K scale due to memory cost. Node.js's event loop solved scale but created callback hell. Promises and async/await flattened callbacks into readable linear code but left stream handling and lifecycle leaks unsolved. Reactive streams and structured concurrency solved those β€” at the cost of a steeper learning curve.

Internals β€” How async/await Actually Works

It's not magic β€” it's a compiler-generated state machine. Here's what really happens when you write await.

Most developers use async/await for years without knowing how it works under the hood. That's fine β€” until something goes wrong. When you get a cryptic stack trace that jumps across threads, or you wonder why your await inside a loop is slow, or why catching an exception from an async void method doesn't work β€” you need the mental model. Let's build it.

The Compiler's Secret β€” Every async Function Becomes a State Machine

When you write an async function with await expressions, the compiler does something surprising: it transforms your linear-looking code into a state machine class. Each await point becomes a numbered state. When execution hits an await, the state machine saves its current state (local variables, which state it's in, where to resume) and returns control to the caller. When the awaited operation completes, the state machine resumes from exactly the saved state.

async/await state machine transformation: linear async code becomes a state machine where each await is a state transition What YOU write async function getProfile(userId) { // State 0 β€” runs synchronously until here ↓ const user = await fetchUser(userId); // State 1 β€” resumes here when fetchUser resolves const orders = await fetchOrders(user.id); // State 2 β€” resumes here when fetchOrders resolves return { user, orders }; } β†’ compiler transforms What the RUNTIME sees State 0 run sync code start fetchUser() fetchUser resolves β†’ thread was free during I/O State 1 restore local: user start fetchOrders() fetchOrders resolves β†’ thread was free again State 2 (done) restore: user, orders resolve the outer Promise Key insight: the thread is FREE between states State 0 β†’ suspend β†’ [thread handles other requests] β†’ I/O done β†’ State 1 β†’ suspend β†’ [thread handles other requests] β†’ I/O done β†’ State 2 Local variables (user, orders) are stored in the state machine heap object β€” not on the thread's call stack β€” so the thread can leave and come back safely.

The Event Loop β€” The Scheduler That Makes It All Run

The state machine above tells you what the compiled code looks like. But who decides when to run State 1? That's the event loop's job.

The event loop is a continuous cycle: check if any pending I/O operations have completed (the OS signals this via epoll/IOCP/kqueue); if yes, pick up the corresponding callback or state machine continuation and run it until it hits the next await; repeat. The event loop never blocks β€” it only runs code that's ready right now. If no I/O has completed and no timers have fired, it sleeps (cheaply, at the OS level) until something becomes ready.

The "aha" moment on event loops: An event loop is just a fancy while(true) loop that asks the OS "did anything finish?" on each iteration. The OS does the actual waiting β€” efficiently, via hardware interrupts. The event loop just processes completions as they arrive. Your await keyword is how you tell the event loop "I'm waiting for this; go process something else, and come back to me when it's done."

What await Actually Compiles To (Simplified)

Here's the mental model in plain terms. When the compiler sees const user = await fetchUser(userId) inside an async function, it generates roughly this logic:

// What the compiler generates (simplified conceptually): function getProfile_StateMachine(userId, resolve, reject) { let state = 0; let user, orders; // locals saved in heap, not on stack function resume(value, err) { if (err) { reject(err); return; } switch (state) { case 0: // Run synchronous code up to first await state = 1; // Start async operation β€” when it finishes, call resume(result) fetchUser(userId).then(val => resume(val, null), err => resume(null, err)); // ← RETURN immediately. Thread is free. Event loop moves on. return; case 1: // fetchUser resolved β€” value is the user object user = value; // restore local variable from saved state state = 2; fetchOrders(user.id).then(val => resume(val, null), err => resume(null, err)); return; // ← RETURN again. Thread is free again. case 2: // fetchOrders resolved β€” value is orders orders = value; resolve({ user, orders }); // complete the outer Promise return; } } resume(undefined, null); // kick off State 0 }

The key line in each case is the final return. The function returns immediately after starting the async operation β€” it doesn't wait. The thread that called it is free to do other work. When the I/O completes, the event loop calls resume again, which runs the next state. This is the entire mechanism β€” no magic, just a loop, a switch, and a callback chain.

Why Stack Traces Look Broken in Async Code

When you get an error inside an async function, the stack trace is often almost useless β€” it shows event loop internals rather than your code's call chain. Now you know why: by the time State 2 runs, the original caller is long gone from the call stack. The thread that's running State 2 might have handled dozens of other requests between State 0 and State 2. The continuation is resumed by the event loop, not by the original caller. There's no stack to trace back through. This is why distributed tracing tools like OpenTelemetry propagate a trace context object through async calls β€” it's the only way to reconstruct the logical call chain across suspension points.

async/await is compiler syntactic sugar over Promises. The compiler transforms each async function into a state machine class, where each await expression is a state transition. When execution hits an await, the state machine saves its locals to the heap and returns control to the event loop β€” freeing the thread. When the I/O completes, the event loop resumes the state machine at the next state. The thread is free between every pair of states, which is why one thread can handle thousands of simultaneous in-flight requests. Broken stack traces in async code happen because the resuming thread is not the originating thread β€” there's no linear call stack to show.

When To Use Async β€” and When Not To

Async isn't always the answer. Here's the decision tree that separates I/O-bound wins from CPU-bound mistakes.

Async programming is a tool, not a default. Using it where it helps is a superpower. Using it where it doesn't is just added complexity with no gain. The single most useful question you can ask is: "Is my code waiting for something external, or is it actually computing?" That answer almost always tells you which way to go.

Decision flowchart: when to use async patterns vs synchronous code vs threads Your operation Waiting for external I/O? YES High concurrency or throughput? YES βœ“ async/await or reactive NO Sync is fine (simple scripts) NO (CPU-bound) Need true parallelism? YES Threads / Worker Pool / Process pool (async here = WRONG) NO Keep sync simple, readable, no overhead Async β‰  faster for CPU work. Adding await to a CPU-bound task frees the thread but still uses the same CPU time β€” it doesn't parallelize it. True parallelism (splitting CPU work across cores) needs threads or worker processes, not async/await. The one heuristic that covers 90% of decisions: Ask "Does this operation cross a process boundary?" Network call β†’ yes. Database query β†’ yes. File read β†’ yes. Redis call β†’ yes. Parsing a JSON string in memory β†’ no. Computing a hash β†’ no. If it crosses a boundary, go async. If it's pure in-memory computation, stay sync and keep it simple. Use async for I/O-bound work where you want the thread free during the wait β€” network calls, database queries, file I/O, fan-out requests. Async does not help CPU-bound computation: adding await to a hash function doesn't speed it up; it just adds state machine overhead. For CPU parallelism you need real threads or worker processes. Skip async for simple scripts, linear transactions, or anywhere the added complexity outweighs the throughput gain. The deciding question: "Is my code waiting for something external?"

Async Patterns vs Alternatives β€” Comparisons

Callbacks, threads, reactive streams, coroutines, sagas β€” each solves a different version of the same problem. Here's how to tell them apart.

The async space is full of overlapping options. async/await, callbacks, threads, reactive streams, coroutines β€” they all make code non-blocking, but they solve different shapes of the problem. Understanding when one is the right tool over another is the difference between junior "I'll just use async/await everywhere" and senior "here's why reactive streams are the right choice for this specific pipeline."

Async approaches comparison: async/await, callbacks, threads, reactive streams, coroutines across readability, streams, parallelism, backpressure, complexity Async Approaches at a Glance APPROACH READABILITY STREAMS TRUE PARALLELISM BACKPRESSURE COMPLEXITY async / await βœ“ Linear ~ Awkward βœ— No βœ— No Low Callbacks βœ— Callback hell ~ With effort βœ— No βœ— No Med (error mgmt) Threads / Thread pool βœ“ Sync-style βœ— Not natural βœ“ Yes βœ— Manual High (sync, locks) Reactive Streams ~ Operator chains βœ“ First-class ~ Schedulers βœ“ Built-in High (operators) Coroutines / Goroutines βœ“ Linear βœ“ Channels βœ“ Yes (Go/Kotlin) ~ Channel buffers Med βœ“ = strong / βœ— = weak or absent / ~ = partial or with workarounds

Pick async/await when you're writing application-level business logic with multiple sequential async steps. Pick callbacks only when you're at the lowest level of a library API that needs maximum compatibility (Node.js EventEmitter, setTimeout) or when a single non-nested callback genuinely reads cleaner.

The single clearest separator: I/O-bound β†’ async/await wins. CPU-bound β†’ threads win. A common mistake is using async/await for image processing or video encoding and wondering why it's not faster β€” async doesn't add CPU parallelism, it just frees the thread during waits.

Reactive streams shine when your data source never stops producing β€” a live sensor feed, a Kafka topic, a real-time analytics pipeline. async/await is better when you're asking for one thing and waiting for the answer. The Saga pattern fits neither β€” it's not about single values or streams, it's about coordinating a multi-step business workflow across services.

Coroutines vs async/await β€” closer than they look: Kotlin coroutines and Go goroutines achieve similar goals to async/await but at the runtime level instead of the compiler level. Go goroutines are truly lightweight threads scheduled by the Go runtime β€” you write blocking-looking code and Go multiplexes thousands of goroutines onto a small OS thread pool. Kotlin coroutines compile to state machines (like async/await) but add structured concurrency β€” child coroutines are cancelled when their parent scope ends, preventing the "fire and forget" leaks that plague raw async/await. async/await is the right default for sequential I/O-bound work. Callbacks are for legacy compatibility or single-level non-nested use. Threads are for CPU-bound parallel work β€” async doesn't help there. Reactive streams are for continuous event feeds that need backpressure and composable pipeline operators. Coroutines (Kotlin/Go) blend async and parallel capabilities with structured lifecycle management. The Saga pattern operates at a different level entirely β€” it coordinates multi-step business workflows across services with compensating transactions, not individual I/O latency.

Real Companies β€” Async at Scale

How Netflix, Discord, WhatsApp, Cloudflare, and Uber chose their async models β€” and why each choice matched their specific shape of problem.

Theory is useful, but seeing how five different companies solved five different async scaling problems β€” each one choosing a different tool for a different reason β€” is how the mental model really locks in. What makes these examples instructive is that each company's async choice was directly driven by the specific shape of their concurrency problem, not just because the technology was popular.

Netflix's architecture involves composing results from dozens of microservices into a single API response. When a user opens the Netflix app, the client makes one call to the API gateway, which then fans out to β€” potentially β€” services for user preferences, continue-watching state, top picks, recent titles, device capabilities, A/B test assignments, and more. Each of those is a separate network call to a separate service.

The problem with sequential async: if each downstream call takes ~50ms and there are ten of them, sequential await means ~500ms latency. But most of those calls are independent β€” the "top picks" service doesn't need to wait for the "continue watching" service to finish. You want all ten running in parallel, then combining results when all (or enough) have returned.

Netflix's engineering teams built heavily on RxJava (now Project Reactor / reactive streams in the JVM ecosystem) to express this fan-out and merge pattern. A reactive pipeline says: initiate all ten calls simultaneously, merge their results as they arrive, apply transformations, handle timeouts per-service (if "top picks" takes more than 100ms, use a cached fallback β€” don't fail the entire response). This pattern β€” scatter-gather with per-leg timeouts and fallbacks β€” is where reactive streams genuinely outperform plain async/await, because it's a continuous composition problem, not just a sequential one.

The Hystrix β†’ Resilience4j transition: Netflix open-sourced Hystrix as a library for circuit breaking and bulkhead isolation in async service calls. Hystrix was built on RxJava. When reactive streams became more standardized, the ecosystem moved toward Resilience4j, which integrates with Project Reactor. The underlying insight stayed the same: each downstream service call should have an independent timeout, retry, and fallback β€” and reactive composition makes expressing that clean.

Discord's core challenge is maintaining persistent WebSocket connections for every user who is currently online β€” potentially millions of simultaneous connections, each one representing a user who may receive real-time messages at any moment. The connection must be kept alive, with heartbeats, state tracking per user, and fast message delivery.

This is exactly the scenario where the Erlang/Elixir BEAM runtime was designed to shine. The BEAM virtual machine implements lightweight processes (different from OS processes β€” they're BEAM-level green threads), each with its own heap, message queue, and garbage collector. The scheduler multiplexes millions of these tiny processes onto a small pool of OS threads. Creating a new BEAM process is extremely cheap, and they are isolated β€” a crash in one process doesn't affect others. Discord runs one BEAM process per WebSocket connection.

The design produces a natural fit: each user connection is modeled as an independent, isolated concurrent unit. If one connection gets a malformed packet and the handling process crashes, the supervisor tree restarts just that process β€” the other millions of connections are completely unaffected. This fault isolation by design is the Erlang/Elixir selling point, and it's what makes BEAM-based systems genuinely different from Node.js (single-threaded), Java threads (expensive), or Go goroutines (good, but without the same built-in supervisor tree isolation).

WhatsApp became famous in the engineering community for running on a surprisingly small server footprint while handling a massive user base. Their primary backend was (and to a significant extent still is) built on Erlang β€” the same BEAM runtime that powers Discord's WebSocket connections.

The core async mechanism is the same: BEAM lightweight processes, actor model, message passing. What made WhatsApp's engineering notable was how aggressively they tuned both the Erlang runtime and the underlying FreeBSD/Linux kernel networking stack to push connection density per physical machine as high as possible. They published that individual servers were handling over two million simultaneous connections β€” a figure that's only possible with a runtime that doesn't tie one OS thread to one connection.

The WHY that matters here: TCP keep-alive connections are mostly idle. A WhatsApp user connected to a server isn't sending messages every millisecond β€” they might be idle for minutes at a time, then send a burst of messages. A thread-per-connection model would mean millions of threads sitting idle, burning memory. The BEAM model means millions of processes sitting idle, each using only the memory for its own message queue and state β€” dramatically cheaper, and with the runtime only scheduling a process when it has actual work to do (a message arrived).

Cloudflare Workers is a serverless platform that runs JavaScript (and WebAssembly) on Cloudflare's edge network β€” on servers physically close to users around the world. The programming model is: you write an async JavaScript function that handles an HTTP request, and Cloudflare runs it on whichever edge node is nearest to the request's origin.

The async model here is intentional and enforced: Workers must be non-blocking. If your Worker code tries to perform a synchronous, blocking system call, there's simply no API for it β€” every I/O operation (fetching another URL, reading from KV storage, querying a D1 database) returns a Promise. The entire programming model forces async as the only option.

The reason Cloudflare chose this architecture is isolation and density. V8 isolates are extremely lightweight JavaScript execution contexts β€” lighter than containers, lighter than VMs. Thousands of isolates can run on a single edge server simultaneously. But isolates work only if they never block β€” a blocked isolate would hold its CPU slice, preventing other isolates from running. The async-only constraint is what makes the density possible. It's the event loop model taken to its logical extreme: the entire platform is built around the assumption that all code is non-blocking, by construction.

Uber's backend handles ride-matching, pricing, driver location updates, ETA calculations, and payment β€” all under real-time latency requirements. The architecture is deeply microservices-based, which means a single user-facing operation (a rider requests a ride) triggers a fan-out of RPC calls to multiple backend services simultaneously.

Go's goroutines fit this pattern well for a specific reason: Go makes spawning a goroutine per downstream RPC call extremely cheap. You dispatch to the pricing service, the driver location service, and the surge-calculation service by launching three goroutines simultaneously and collecting their results with channels. The Go scheduler multiplexes all those goroutines onto a thread pool automatically β€” you write code that looks sequential per goroutine, but all three are in flight at once.

The key difference from JavaScript async/await: Go goroutines support true parallelism across multiple CPU cores (the Go scheduler will run goroutines on multiple OS threads), while Node.js async/await is single-threaded. For a service that needs both high concurrency (many in-flight RPCs) and some CPU work per request (fare calculation, ETA algorithms), Go's model is a better fit than a single-threaded event loop. The goroutine fan-out pattern β€” launch one per downstream call, collect with WaitGroup or channels β€” is idiomatic Go and maps naturally to the scatter-gather problem.

Netflix uses reactive streams (RxJava/Reactor) to express fan-out, per-leg timeouts, and fallbacks across dozens of simultaneous microservice calls β€” a pattern reactive composition handles more cleanly than sequential awaits. Discord and WhatsApp both use Erlang/Elixir BEAM lightweight processes to maintain millions of persistent WebSocket/TCP connections β€” each connection is one isolated BEAM process, and the BEAM scheduler runs only processes with actual work. Cloudflare Workers enforces async-only V8 isolates to enable extreme density on edge nodes. Uber uses Go goroutines for fan-out RPC dispatch, benefiting from Go's ability to run goroutines in parallel across multiple cores, unlike Node.js's single-threaded event loop.

Production Bugs β€” Async Case Studies

Three real categories of async failure that have taken down production services β€” and exactly what to do differently.

Async bugs are especially painful because they're often silent β€” the code runs, returns a value, and you never know something went wrong. Or they're intermittent β€” they only appear under load, making them hard to reproduce. The three bugs below represent the most common failure categories. Understanding the root cause of each one is more valuable than memorizing the fix, because the same root cause appears in dozens of different forms.

Incident

A background worker process processes incoming jobs from a queue. Under normal load, jobs process fine. Under high load, the worker starts crashing and restarting repeatedly, every few minutes, with no error logs. Job throughput collapses to zero during the restart window. The queue depth climbs. The on-call engineer sees: UnhandledPromiseRejectionWarning: Error: connect ECONNREFUSED 127.0.0.1:5432 β€” followed immediately by process exit. The database was temporarily overloaded and rejecting connections. Instead of queuing the retry, the worker process is dying.

What Went Wrong

The bug is about when the rejection handler is attached relative to when the rejection happens. In Node.js (and browsers), a Promise that rejects with no .catch() handler attached becomes an unhandled rejection. In older Node.js versions, unhandled rejections printed a warning. Starting in Node.js 15, the default behavior changed: an unhandled promise rejection terminates the process β€” the same as an uncaught synchronous exception.

The subtle version of this bug: the rejection handler is attached asynchronously β€” in a setTimeout, or after an await that completes after the rejection fires. The rejection fires at tick N, the handler is attached at tick N+1, so the runtime sees an unhandled rejection at tick N and terminates. This happens particularly with fire-and-forget patterns: you launch an async function without awaiting it and without attaching a .catch() to the returned Promise. If that function ever rejects, there is no handler.

Unhandled rejection: Promise rejects at tick 1, handler would be attached at tick 3 β€” too late, process exits at tick 2 Bug 1: Rejection fires before the handler is attached time β†’ Tick 0 processJob() launched without await no .catch() attached yet Tick 1 DB rejects connection Promise.reject() fires ⚠ No handler found! πŸ’€ Process Exit Node.js terminates on unhandled rejection Tick 3 (too late) .catch() would have been attached here β€” but process is already gone Fix: always await or attach .catch() in the SAME tick you launch the Promise. Never fire-and-forget without error handling. // ❌ BUGGY: fire-and-forget with no error handling // processJob() returns a Promise, but we don't await it // and we don't attach a .catch(). If it rejects β†’ process dies. async function drainQueue(queue) { while (true) { const job = await queue.dequeue(); if (!job) break; processJob(job); // ← Returns a Promise. We IGNORE it. // If processJob rejects (e.g. DB down), // Node.js sees an unhandled rejection β†’ process exit. } } async function processJob(job) { const conn = await db.connect(); // ← this can throw if DB is overloaded await conn.query(`INSERT INTO results ...`); conn.release(); console.log(`Job ${job.id} done`); } // βœ… FIX 1: Await the job β€” errors propagate to the drainQueue handler async function drainQueue(queue) { while (true) { const job = await queue.dequeue(); if (!job) break; try { await processJob(job); // ← await means any rejection is caught here } catch (err) { // Log, increment failure counter, maybe re-queue console.error(`Job ${job.id} failed:`, err.message); await queue.nack(job); // put job back for retry } } } // βœ… FIX 2: If you truly want fire-and-forget (parallel jobs), // attach .catch() in the SAME tick you launch: function drainQueueParallel(queue, jobs) { for (const job of jobs) { processJob(job) // launch immediately .catch(err => { // ← .catch() attached synchronously console.error(`Job ${job.id} failed:`, err.message); queue.nack(job); }); } } // βœ… FIX 3: Global safety net (catches any missed rejections) // Add this to every Node.js process β€” it prevents silent crashes process.on('unhandledRejection', (reason, promise) => { console.error('Unhandled rejection at:', promise, 'reason:', reason); // In production: alert + graceful shutdown, not silent ignore }); Lesson: Every Promise that can reject must have a rejection handler attached in the same synchronous tick it was created. Awaiting is the cleanest form. If you fire-and-forget for parallelism, chain .catch() immediately. Add a global process.on('unhandledRejection') handler as a last-resort safety net that alerts before any crash. How to Spot: Search your codebase for any async function call that is not prefixed with await and not assigned to a variable that gets .catch() chained on it. Run node --unhandled-rejections=throw in tests to surface these early. Add a process-level unhandledRejection listener to every worker process in production.
Incident

An API endpoint that fetches a user's dashboard β€” loading data for their ten most recent orders. In development (1-2 orders per test account), the endpoint responds in ~80ms. In production, with real users having 10+ orders, the endpoint regularly takes 800ms–1200ms. The database team confirms the per-query latency is fine (~80ms each). Nobody can explain why the endpoint is 10Γ— slower in production than in testing.

What Went Wrong

The classic async loop mistake: for ... await makes the loop execute sequentially β€” the next iteration doesn't start until the previous await resolves. Ten independent database queries that could run in parallel are instead running one after the other. Total latency = sum of all ten queries (~80ms Γ— 10 = ~800ms) instead of the time of the slowest single query (~80ms). This is not a database performance issue β€” it's a parallelism issue. The database could handle all ten queries simultaneously; the code just never asked it to.

Sequential await in loop vs Promise.all parallel: 800ms vs 80ms for 10 independent queries for { await } β€” SEQUENTIAL ❌ Promise.all β€” PARALLEL βœ“ q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 β†’ one after the other Total: ~800ms all 10 queries in flight at once wait only for the slowest one Total: ~80ms The database can handle all 10 queries simultaneously. The sequential loop never asked it to. Promise.all / Task.WhenAll initiates all queries at once and waits for the last one to complete. // ❌ BUGGY: for...await loop β€” sequential, 10Γ— slower than necessary // Each query waits for the previous one to FINISH before starting. // Total latency = sum of all 10 query times. async function getDashboard(userId) { const recentOrderIds = await db.getRecentOrderIds(userId, 10); const orders = []; for (const orderId of recentOrderIds) { const order = await db.getOrderDetails(orderId); // ← waits for each one! orders.push(order); } // If each query takes 80ms: 10 Γ— 80ms = 800ms total return orders; } // βœ… FIX: Promise.all β€” all queries in flight simultaneously // All 10 db.getOrderDetails() calls are initiated at the same moment. // We wait once for the slowest one. Total latency β‰ˆ max(all queries) β‰ˆ 80ms. async function getDashboard(userId) { const recentOrderIds = await db.getRecentOrderIds(userId, 10); // Map each ID to a Promise β€” initiates all 10 DB calls immediately const orderPromises = recentOrderIds.map(id => db.getOrderDetails(id)); // Wait for all of them to complete (or any to fail) const orders = await Promise.all(orderPromises); // If each takes 80ms: total β‰ˆ 80ms (not 800ms) return orders; } // BONUS: If one failure should NOT abort the rest, use allSettled: async function getDashboardSafe(userId) { const recentOrderIds = await db.getRecentOrderIds(userId, 10); const results = await Promise.allSettled( recentOrderIds.map(id => db.getOrderDetails(id)) ); // results[i].status === 'fulfilled' | 'rejected' return results .filter(r => r.status === 'fulfilled') .map(r => r.value); } Lesson: for...await is sequential by design β€” use it only when each iteration must finish before the next starts (e.g., you're paginating and each page depends on a cursor from the previous one). For independent parallel work, collect Promises with .map() and then use Promise.all(). For safety when partial results are acceptable, use Promise.allSettled(). How to Spot: In code review, flag any for...of loop with an await inside that is iterating over independent items (order IDs, user IDs, file names). Ask: "Does iteration N depend on the result of iteration N-1?" If no, it should be Promise.all. Performance profiling will show the endpoint latency equals exactly N Γ— per-query time.
Incident

A Node.js API service handles product search. A new feature is shipped: when the search result set is large, the service builds a summary object by deeply cloning and transforming the result JSON in memory before returning. The transformation uses JSON.parse(JSON.stringify(data)) to deep-clone, followed by a custom traversal. The endpoint is fast for small result sets. For large searches (hundreds of products), all other API endpoints β€” completely unrelated to search β€” start experiencing latency spikes of 300–800ms during peak traffic. The symptom looks like "search is slow" but the actual effect is that every endpoint suffers simultaneously.

What Went Wrong

Node.js runs JavaScript on a single thread β€” the event loop thread. This thread does everything: accepts new connections, reads HTTP request bytes, executes your JavaScript, calls I/O APIs, runs async callbacks. While this thread is running JavaScript, it cannot do anything else. When you do synchronous CPU-heavy work on the event loop thread β€” even something that looks innocent like parsing a large JSON string or traversing a deep object tree β€” the thread is busy the entire time that computation runs. No other request can be accepted. No pending I/O callbacks can fire. The event loop is blocked.

The confusing part for developers: the code that causes the blockage doesn't look "async" at all. It's synchronous code, which feels safe. But in Node.js, "synchronous" on the main thread means the entire server is frozen for that duration. A 400ms synchronous computation on the event loop thread = 400ms of zero response to all other requests, regardless of how many are in flight.

Blocked event loop: CPU-heavy sync work on main thread stalls all other request handling for its duration Bug 3: CPU work blocks the event loop β€” all requests stall normal callbacks CPU-heavy sync work on event loop thread JSON.parse(JSON.stringify(largeData)) + traversal β€” ~400ms req #2 queued req #3 queued req #4 queued req #5 queued ALL requests (even unrelated ones) stall for the full ~400ms queued requests flush Fix: move CPU-heavy work off the event loop thread β€” use Node.js worker_threads, stream/chunk the processing, or restructure to avoid large in-memory transformations. The event loop should only run code that completes in single-digit milliseconds per turn. // ❌ BUGGY: heavy synchronous work on the event loop thread // While this runs, NO other request can be handled. The server freezes. app.get('/search', async (req, res) => { const rawResults = await db.searchProducts(req.query.q, { limit: 500 }); // SYNC CPU work β€” looks harmless, but blocks the event loop for ~400ms // if rawResults is large. JSON.stringify + JSON.parse are synchronous // operations that run on the main thread, not in libuv's thread pool. const deep = JSON.parse(JSON.stringify(rawResults)); // ← BLOCKS // Custom traversal also synchronous, O(n) over 500 deep objects: const summary = buildSummary(deep); // ← BLOCKS res.json(summary); }); function buildSummary(products) { // Deeply iterates 500 product objects β€” hundreds of ms at scale return products.reduce((acc, p) => { acc[p.category] = acc[p.category] || []; acc[p.category].push({ id: p.id, name: p.name, price: p.price }); return acc; }, {}); } // βœ… FIX 1: Offload CPU work to a Worker Thread // worker_threads run on a separate OS thread β€” event loop stays free. const { Worker, isMainThread, parentPort, workerData } = require('worker_threads'); app.get('/search', async (req, res) => { const rawResults = await db.searchProducts(req.query.q, { limit: 500 }); // Offload the CPU-heavy transformation to a worker thread const summary = await runInWorker('./build-summary-worker.js', rawResults); res.json(summary); }); function runInWorker(scriptPath, data) { return new Promise((resolve, reject) => { const worker = new Worker(scriptPath, { workerData: data }); worker.on('message', resolve); worker.on('error', reject); }); } // build-summary-worker.js (runs in its own thread): // const { workerData, parentPort } = require('worker_threads'); // const summary = buildSummary(workerData); // blocks only this thread // parentPort.postMessage(summary); // βœ… FIX 2: Avoid the deep clone entirely (the simpler fix) // Deep cloning is often unnecessary β€” if you're only reading, don't clone. // Restructure to query only what you need from the DB. app.get('/search', async (req, res) => { // Query only the fields needed for the summary β€” no deep clone needed const summary = await db.searchProductSummary(req.query.q, { limit: 500, fields: ['id', 'name', 'price', 'category'] // server-side projection }); res.json(groupByCategory(summary)); // now O(n) on small objects β€” fast }); // βœ… FIX 3: Chunk processing to yield to the event loop periodically // setImmediate yields control back to the event loop after each chunk async function buildSummaryChunked(products, chunkSize = 50) { const result = {}; for (let i = 0; i < products.length; i += chunkSize) { const chunk = products.slice(i, i + chunkSize); for (const p of chunk) { result[p.category] = result[p.category] || []; result[p.category].push({ id: p.id, name: p.name, price: p.price }); } // Yield to event loop after each chunk β€” other requests can run if (i + chunkSize < products.length) { await new Promise(resolve => setImmediate(resolve)); } } return result; } Lesson: The event loop thread must never be blocked for more than a few milliseconds per turn. Any synchronous operation that takes more than ~10ms (large JSON parse, complex in-memory computation, crypto, image processing) must be moved to a worker thread, offloaded to a child process, or chunked with setImmediate to yield control periodically. The async keyword does not protect you from this β€” async only helps for I/O waits; synchronous computation is always on the main thread. How to Spot: Use Node.js's built-in --inspect flag with the Chrome DevTools performance profiler to spot long synchronous tasks on the main thread. The blocked_event_loop_lag metric (available via perf_hooks.monitorEventLoopDelay()) shows how long the event loop is being held. A healthy Node.js server should have event loop lag well under 100ms β€” anything higher under load indicates synchronous blocking. Also watch for: large JSON.parse/JSON.stringify, Array.sort() on large arrays, deep recursive traversals, and synchronous crypto operations.
Three production-grade async bugs: (1) Unhandled promise rejection crashes a Node.js worker process β€” fix by always awaiting or immediately chaining .catch(), and adding a global unhandledRejection safety net. (2) Await-in-a-loop makes independent operations sequential β€” fix by using Promise.all() to parallelize independent async work, reducing latency from NΓ—query-time to max(query-time). (3) Synchronous CPU-heavy work on the Node.js event loop thread blocks all other request handling β€” fix by offloading to worker_threads, avoiding unnecessary deep clones, or chunking work with setImmediate. All three share a common thread: async code requires intentional error handling, intentional parallelism, and intentional thread management β€” none of these are automatic.

Pitfalls & Anti-Patterns

Five async mistakes that look harmless until you're debugging a production incident at midnight β€” what went wrong, why it always goes wrong, and the exact fix.

Async code has a special property: its bugs are often invisible. Synchronous code fails loudly β€” an exception unwinds the stack, the caller sees it, the log catches it. Async bugs tend to fail silently: a promise settles in the wrong order, an error disappears into the void, a chain of microtasks slowly leaks memory. Each of the five pitfalls below has burned teams in real codebases. None of them is exotic β€” they're beginner traps dressed in professional clothing.

The mistake: You have an array of IDs and you need to fetch data for each one. The natural loop instinct is to write for (const id of ids) { const result = await fetch(id); }. It works. But it's secretly slow β€” each iteration waits for the previous one to finish before starting the next.

Why it's bad: If each fetch takes 500ms and you have six IDs, your loop takes 3,000ms β€” three full seconds. But those six fetches have no dependency on each other. You could fire all six simultaneously and be done in ~500ms instead. The sequential loop throws away the biggest benefit of async: the ability to have many slow operations in flight at the same time. The CPU sits idle while the thread waits for each response, one by one.

Await in loop (sequential) vs Promise.all (parallel): sequential takes NΓ—latency, parallel takes 1Γ—latency BAD β€” sequential await in loop (3,000ms) GOOD β€” Promise.all (500ms) fetch1 β†’ fetch2 β†’ fetch3 β†’ fetch4 β†’ fetch5 β†’ fetch6 (one at a time) 0ms 2500ms 3000ms βœ— all 6 in flight simultaneously 0ms 500ms βœ“

Fix: Use Promise.all() to fire all independent operations simultaneously and await the whole batch. If you need to process results as they arrive instead of waiting for all of them, use Promise.allSettled() (which never rejects β€” it gives you each result or error individually) or for await...of on an async generator. Only use a sequential loop when each step genuinely depends on the previous step's result.

// BAD β€” sequential: each fetch waits for the previous one to finish // 6 fetches Γ— 500ms each = 3,000ms total wall-clock time async function getUserProfiles(userIds) { const profiles = []; for (const id of userIds) { const profile = await fetchProfile(id); // blocks here until each resolves profiles.push(profile); } return profiles; // done in ~3,000ms instead of ~500ms } // GOOD β€” parallel: all fetches fire simultaneously with Promise.all // 6 fetches Γ— 500ms each = ~500ms total (only as slow as the slowest one) async function getUserProfiles(userIds) { const profiles = await Promise.all( userIds.map(id => fetchProfile(id)) // all kicked off at once ); return profiles; } // If you need individual error handling (one failure β‰  total failure): async function getUserProfilesSafe(userIds) { const results = await Promise.allSettled( userIds.map(id => fetchProfile(id)) ); return results .filter(r => r.status === "fulfilled") .map(r => r.value); }

The mistake: You call an async function without await and without .catch(). The call returns a promise, but you don't hold a reference to it. The promise floats free. When it eventually rejects β€” because the network is down, because a field is undefined, because any of a hundred things go wrong β€” there's nobody home to catch it. The error vanishes.

Why it's bad: In Node.js, an unhandled promise rejection used to log a warning and move on. Starting with Node 15, it crashes the process β€” which is closer to correct, but still surprising in production. In browsers, the same silent rejection hides bugs that only surface as "that feature just doesn't work sometimes." The worst part: the code looks right. A missing await doesn't generate a syntax error or a lint warning by default. It fails quietly, at runtime, under specific conditions. These bugs are the hardest kind to reproduce.

Floating promise: error goes unhandled and disappears. Awaited promise: error surfaces immediately at the call site. BAD β€” floating promise call site (no await) Promise (floating) rejects βˆ… Error disappears β€” no log, no crash (or crash with no context) GOOD β€” awaited (or .catch attached) call site (await / .catch) Promise rejects catch block βœ“ Error surfaces immediately with full stack trace

Fix: Every async function call must be either awaited (inside an async context), chained with .catch(), or explicitly documented as fire-and-forget with a deliberate .catch(err => logger.error(err)) guard. Enable the ESLint rule @typescript-eslint/no-floating-promises β€” it makes the TypeScript compiler catch exactly this mistake at build time, before it reaches production.

// BAD β€” floating promise: saveAuditLog returns a Promise but nobody awaits it async function processOrder(order) { await chargeCard(order.payment); // awaited β€” errors surface here βœ“ saveAuditLog(order); // πŸ’€ NOT awaited β€” if this rejects, nobody knows return { success: true }; } // Also bad: .then() without .catch() fetchUserData(userId).then(data => updateUI(data)); // πŸ’€ rejects silently // GOOD β€” either await it, or attach an explicit .catch() guard async function processOrder(order) { await chargeCard(order.payment); await saveAuditLog(order); // awaited β€” rejection surfaces to the caller βœ“ return { success: true }; } // If fire-and-forget is genuinely intentional, document it and add error guard: async function processOrder(order) { await chargeCard(order.payment); saveAuditLog(order).catch(err => { logger.error("Audit log failed (non-fatal):", err); // explicit, visible }); return { success: true }; } // Or use the ESLint rule to catch this at build time: // "@typescript-eslint/no-floating-promises": "error"

The mistake: You have a class that needs to do some async work when it's created β€” load configuration from a file, open a database connection, fetch an initial state from an API. Since constructors are synchronous in JavaScript, Java, C#, and most languages, you try to start the async work inside the constructor and hope it finishes before anyone uses the object.

Why it's bad: Constructors can't be async and they can't be awaited. When you call new MyService(), the constructor returns immediately β€” your async work is still in flight. Any code that calls methods on the returned object might run before the initialization is finished. In JavaScript you get race conditions that depend on timing; in other languages you get NullReferenceException or uninitialized state accessed mid-construction. The object appears ready but isn't. This is a subtle bug because it usually works fine in unit tests (which are fast) and only fails in production (where startup paths are slower).

Async constructor: object returned before async init finishes. Static factory method: awaits init, then returns a fully-ready object. BAD β€” async work in constructor new Service() object (not ready) ← returns immediately async init (still running) caller.method() πŸ’₯ accesses uninitialized state GOOD β€” static async factory await Service.create() new Service() await this.init() return this βœ“ ready βœ“ Caller receives a fully initialized object β€” every time

Fix: Use the static async factory method pattern. Keep the constructor synchronous and private (or at least minimal). Add a static method like Service.create() that is async, calls new Service() internally, awaits all initialization work, then returns the fully-ready instance. Callers write const svc = await Service.create() β€” they see a clean async call, and they know the returned object is ready to use.

// BAD β€” async work started in the constructor; object returned before it finishes class DatabaseService { constructor(connectionString) { this.conn = null; // πŸ’€ no way to await this β€” constructor can't be async this.connect(connectionString); // floats as a promise; this.conn is null until it resolves } async connect(cs) { this.conn = await openConnection(cs); // finishes AFTER constructor returns } async query(sql) { return this.conn.execute(sql); // πŸ’₯ this.conn might still be null here } } // Caller const db = new DatabaseService(connStr); // looks ready β€” isn't await db.query("SELECT 1"); // race condition: might fail // GOOD β€” static async factory: caller awaits initialization explicitly class DatabaseService { constructor() { this.conn = null; } // Private async init β€” called only by the factory async #initialize(connectionString) { this.conn = await openConnection(connectionString); } // Public factory: awaits all setup before returning the instance static async create(connectionString) { const svc = new DatabaseService(); await svc.#initialize(connectionString); return svc; // guaranteed ready β€” conn is set before caller gets this } async query(sql) { return this.conn.execute(sql); // always safe β€” factory guarantees initialization } } // Caller const db = await DatabaseService.create(connStr); // explicit, auditable await db.query("SELECT 1"); // always safe βœ“

The mistake: You wrap an async operation in try/catch β€” which is correct! β€” but inside the catch block you either do nothing at all, or you log the error to a variable that's never checked, or you silently return a default value without recording that anything went wrong. The exception is caught. And immediately discarded.

Why it's bad: A swallowed error is worse than no error handling at all. Without a try/catch, the error at least propagates up the call stack where something β€” a framework, a process-level handler, a test β€” might catch it and alert on it. With a silent catch, the error is permanently gone. The function returns as if it succeeded. The system continues operating in a degraded or corrupted state while every monitoring tool reports "all green." These bugs create ghost failures: the product isn't working, but your dashboards say it is. Incidents caused by swallowed errors are the hardest to debug because there's no error trail to follow.

Swallowed error: empty catch block hides the bug. Proper error handling: log with context, re-throw or return a typed Result. BAD β€” catch block does nothing async op fails catch(err) { } βˆ… error gone System continues in degraded state β€” dashboards show green βœ“, product is broken βœ— GOOD β€” log + re-throw (or typed Result) async op fails catch(err) { log; throw } propagates βœ“ On-call gets an alert. Stack trace shows exactly what broke.

Fix: A catch block has three acceptable behaviors: (1) log the error with context and re-throw so callers know something failed, (2) return a typed Result object (like { success: false, error }) so the caller can make an informed decision, or (3) handle a specific, expected error case (like ENOENT on file-not-found) and let everything else propagate. An empty catch { } or catch { return null; } with no logging is almost never correct in production code.

// BAD β€” empty catch and silent return: nobody ever finds out this failed async function syncUserToAnalytics(userId) { try { const user = await db.findUser(userId); await analytics.track("user_sync", user); } catch (err) { // πŸ’€ error swallowed β€” caller gets undefined, thinks it succeeded // analytics data is silently missing, nobody gets paged } } // Also bad: assigning to a dead variable (snippet from another try/catch) // ... // } catch (error) { // const ignored = error; // πŸ’€ "ignored" is never read β€” lint won't catch this // } // GOOD β€” log with context and re-throw so the error is visible async function syncUserToAnalytics(userId) { try { const user = await db.findUser(userId); await analytics.track("user_sync", user); } catch (err) { // Log BEFORE re-throwing so the context (userId, operation name) is captured logger.error("syncUserToAnalytics failed", { userId, err }); throw err; // caller decides whether to swallow, retry, or surface to the user } } // Alternative: typed Result (no exception propagation, caller handles explicitly) async function syncUserToAnalytics(userId) { try { const user = await db.findUser(userId); await analytics.track("user_sync", user); return { ok: true }; } catch (err) { logger.error("syncUserToAnalytics failed", { userId, err }); return { ok: false, error: err }; // caller MUST check .ok β€” no silent failure } }

The mistake: A promise is created β€” maybe as a timeout wrapper, maybe as a "wait for this condition" helper β€” and the code that was supposed to resolve or reject it never runs. Or: your code builds promise chains dynamically (waiting for event X, chaining more work onto it, waiting for event Y, chaining again) without any bound on how long the chain can grow. Either way, the promise objects, and everything they captured in their closures, stay in memory forever.

Why it's bad: JavaScript (and most runtimes) can't garbage-collect a promise until it settles, because something might still attach a .then() handler to it. A promise that never resolves or rejects is a permanent memory leak. In a long-lived server process that creates thousands of these per minute β€” for example, a WebSocket handler that creates a "this connection is alive" promise per connection, but the rejection path has a bug β€” heap memory grows steadily over hours or days until the process OOMs and restarts. These leaks are hard to spot because they don't show up as large single objects; they look like slow, steady growth of small objects that the profiler reports as "anonymous" or "closure."

Never-resolving promises leak memory. Fix: always pair a promise with a timeout or AbortController that guarantees resolution. BAD β€” heap grows: promises never settle heap (pending promises) β†’ steady growth β†’ OOM t=0 t=hours GOOD β€” promise always settles (timeout / abort) heap stays flat β€” promises settle and are GC'd t=0 t=hours βœ“

Fix: Every promise you create must have a guaranteed path to resolution β€” either it resolves naturally, or it rejects on error, or you add a timeout that forces it to settle after a deadline. In browser and Node.js code, pair long-lived promises with an AbortController or a Promise.race() against a timeout sentinel. For event-based promises (waiting for a user action, a WebSocket message, a condition flag), always register a cleanup function β€” an AbortSignal, an EventEmitter.removeListener, or a cancel token β€” so that if the surrounding context is torn down, the promise settles and can be collected.

// BAD β€” promise that might never settle: no timeout, no cleanup on disconnect function waitForAck(socket, messageId) { return new Promise((resolve) => { // πŸ’€ if the ack never arrives (client disconnected, message lost), // this promise hangs forever. resolve is captured in closure β€” never GC'd. socket.on("ack", (id) => { if (id === messageId) resolve(id); }); }); } // Called thousands of times per hour β€” each failed call leaks one promise + closure const ack = await waitForAck(socket, msg.id); // GOOD β€” timeout guarantees the promise always settles; cleanup removes listeners function waitForAck(socket, messageId, timeoutMs = 5000) { return new Promise((resolve, reject) => { const timer = setTimeout(() => { socket.off("ack", onAck); // remove listener β€” no more closure retention reject(new Error(`Ack timeout for message ${messageId} after ${timeoutMs}ms`)); }, timeoutMs); function onAck(id) { if (id !== messageId) return; clearTimeout(timer); // cancel timeout sentinel socket.off("ack", onAck); // clean up listener resolve(id); // promise settles β†’ can be GC'd } socket.on("ack", onAck); }); } // Now: every call either resolves (ack received) or rejects (timeout) β€” never leaks try { await waitForAck(socket, msg.id); } catch (err) { logger.warn("Ack not received, retrying:", err.message); }
Five async pitfalls that reliably show up in production code. Awaiting inside a loop serializes independent operations β€” use Promise.all for anything that doesn't have a data dependency. Floating promises silently discard errors β€” every async call needs await or .catch. The async constructor anti-pattern gives callers an object that isn't ready yet β€” use a static async factory method instead. Swallowing errors in empty catch blocks hides failures while dashboards show green β€” always log and re-throw, or return a typed Result. Never-resolving promises leak memory because the runtime can't garbage-collect a pending promise β€” always pair a promise with a timeout or abort path that guarantees it eventually settles.

Testing Async Code

Why async tests fail in surprising ways β€” and the four tools (fake clocks, promise resolvers, fault injection, and parallelism probes) that make them deterministic.

Testing async code has a reputation for being annoying. Tests that pass locally fail in CI. Tests that pass on a fast machine fail on a slow one. A test for a race condition works on the first run and fails randomly on the tenth. None of this is accidental β€” asynchronous code is time-dependent by nature, and most test runners were designed for synchronous code. The tools below make async tests deterministic again by giving you control over time itself.

The Async Test Harness β€” Four Tools You Need

Async test harness: fake clock controls time, promise resolver injects controlled results, fault injector triggers failures, parallel probe measures concurrency. Async Test Harness β€” Four Control Surfaces Fake Clock Controls setTimeout / setInterval Advance time manually: tick(5000) β†’ test timeouts without waiting Jest: useFakeTimers(), advanceTimersByTime() Promise Resolver Inject controlled resolve/reject from outside the async function β†’ test retry / backoff logic jest.fn().mockResolvedValueOnce() / mockRejectedValueOnce() Fault Injector Simulate network errors, timeouts delayed resolution, partial failures β†’ verify DLQ / fallback paths nock / msw / custom AbortController Parallel Probe Measure wall-clock time to verify Promise.all fired in parallel β†’ catch accidental re-serialization performance.now() before/after + assert elapsed < NΓ—singleLatency Code Under Test (async functions, promise chains)

Testing Promises β€” await + assert

The simplest async test pattern is also the most underused: just await the function under test and assert on the result. Jest and Mocha both support async test functions natively β€” if an awaited promise rejects inside an async test, the test fails automatically. This handles the happy path and the basic error path with no extra tooling.

Testing Race Conditions β€” Fake Timers

Fake timers are the most powerful tool in the async testing toolkit. They replace the real setTimeout/setInterval/Date implementations with fake versions you control. You call advanceTimersByTime(5000) and the test behaves as if 5 real seconds passed β€” instantly, deterministically. This is how you test timeout logic, retry backoff, and debounce/throttle behavior without your test suite taking 30 minutes to run.

Testing Error Paths β€” Controlled Rejections

Mock the dependency to reject exactly once, then succeed. Assert that your retry logic fires. Assert that the error is logged. Assert that on the Nth rejection the function throws or routes to a fallback. This tests behavior that would be nearly impossible to trigger reliably against real infrastructure.

Testing Parallel vs Sequential β€” Wall-Clock Timing

The easiest way to verify that Promise.all is actually running in parallel: make each mock take a fixed delay (say, 100ms), run the function, and measure elapsed time. If the total is close to 100ms β€” one slot β€” the calls were parallel. If it's close to NΓ—100ms, the calls were sequential. This catches accidental re-serialization (someone added an await inside the map callback) that a purely logical assertion would miss.

// ─── Testing async happy path + error path ─────────────────────── describe("fetchUserProfile", () => { it("resolves with the user when the API succeeds", async () => { // Arrange: mock the fetch dependency to resolve with fixture data mockFetch.mockResolvedValueOnce({ id: "u1", name: "Alex" }); // Act: await the async function under test const result = await fetchUserProfile("u1"); // Assert: synchronously, because we awaited the result expect(result.name).toBe("Alex"); }); it("throws when the API rejects", async () => { // Arrange: inject a controlled failure mockFetch.mockRejectedValueOnce(new Error("Network error")); // Act + Assert: expect the async function to throw await expect(fetchUserProfile("u1")).rejects.toThrow("Network error"); // ^ Jest understands Promise rejections β€” test fails if no throw }); }); // ─── Testing retry logic with fake timers ────────────────────── describe("fetchWithRetry", () => { beforeEach(() => jest.useFakeTimers()); afterEach(() => jest.useRealTimers()); it("retries after exponential backoff on transient failure", async () => { mockFetch .mockRejectedValueOnce(new Error("503")) // fail on attempt 1 .mockRejectedValueOnce(new Error("503")) // fail on attempt 2 .mockResolvedValueOnce({ data: "ok" }); // succeed on attempt 3 const promise = fetchWithRetry("/api/data"); // start β€” don't await yet // Advance fake clock to trigger first retry (1s backoff) await jest.advanceTimersByTimeAsync(1000); // Advance fake clock to trigger second retry (2s backoff) await jest.advanceTimersByTimeAsync(2000); const result = await promise; expect(result.data).toBe("ok"); expect(mockFetch).toHaveBeenCalledTimes(3); }); }); // ─── Testing parallel vs sequential ─────────────────────────── describe("loadDashboardData", () => { it("fetches all data sources in parallel, not sequentially", async () => { // Each mock takes exactly 100ms mockFetchUsers.mockImplementation(() => delay(100).then(() => [])); mockFetchOrders.mockImplementation(() => delay(100).then(() => [])); mockFetchMetrics.mockImplementation(() => delay(100).then(() => ({}))); const start = performance.now(); await loadDashboardData(); const elapsed = performance.now() - start; // If parallel: ~100ms. If sequential: ~300ms. // Allow up to 200ms to account for test runner overhead. expect(elapsed).toBeLessThan(200); }); }); The golden rule for async tests: Every async test must either return the promise or await it. If you forget both, the test framework runs the assertion synchronously, the promise hasn't settled yet, and the test passes even when the logic is broken. This is the single most common source of "false green" async tests. Async tests fail unpredictably because they depend on timing, and most test runners were designed for synchronous code. Four tools make them deterministic: fake timers replace real setTimeout so you can advance time without waiting; controlled mock rejections let you test error paths and retry logic precisely; fault injection tests the dead-letter and fallback paths that only trigger under failure; wall-clock timing tests verify that Promise.all actually runs in parallel. Always return or await the promise in a test β€” a forgotten await produces a false-green test that passes even when the logic is wrong.

Observability β€” Async-Specific

Synchronous services fail loudly. Async services fail quietly, in the gaps between your metrics. Here's what to actually watch.

If you monitor an async-heavy service the same way you'd monitor a synchronous REST API β€” CPU, memory, request rate, error rate β€” you'll miss most of what can go wrong. Async failures manifest differently: the event loop doesn't crash, it just gets slow. Promises don't throw, they accumulate. Callbacks don't error, they never get called. You need a different set of instruments.

The Five Async Signals Worth Watching

Async observability dashboard: event loop lag, in-flight promise count, async stack trace depth, deadlock detection, and p99 callback latency. Async Observability Dashboard Event Loop Lag 42ms ⚠ warn >10ms alert >100ms In-Flight Promises spike count over time (now: 3,241) Async Stack Depth depth buckets (red = deep chains) Stuck Async Calls 2 promises >30s with no progress threshold: 0 in normal operation ⚠ potential deadlock / leak p99 Callback Latency creep now: 18ms (target <10ms) Alert thresholds: loop lag >100ms Β· in-flight >10k Β· stuck calls >0 Β· p99 latency >2Γ— baseline Async Golden Signals (analogous to RED β€” Rate, Errors, Duration β€” but async-specific) Concurrency in-flight ops count Loop Saturation event loop lag Silent Errors unhandled rejection count Tail Latency p99 callback wait time Lifecycle Leaks stuck / never-settling ops

Event Loop Lag

The event loop is the engine that drives all async work in a single-threaded runtime. When something blocks it β€” a synchronous computation that runs too long, a tight loop that never yields, a slow JSON parse on a huge payload β€” every other callback waiting to run is delayed. You measure this with a trick: schedule a timer to fire in 1ms, measure how long it actually takes. The excess is the lag. Anything above 10ms is worth investigating. Above 100ms, your server starts missing SLAs. Libraries like clinic.js and the Node.js perf_hooks module (using monitorEventLoopDelay) measure this continuously.

In-Flight Promise Count

Track how many async operations are currently in progress at any given moment. A healthy service has a relatively stable count. A sudden spike means a burst of traffic or a cascade of retries. Steady growth over time β€” without a corresponding traffic increase β€” means you have a leak: promises being created faster than they settle. Instrument this with a simple counter: increment on promise creation, decrement on settlement. Alert when the count exceeds a threshold or grows for more than a few minutes.

Async Stack Traces

Stack traces from async code are famously unhelpful. By the time an error surfaces, the original call site that created the promise is gone from the stack β€” the runtime only shows the frame where the promise rejected, not where it was created. Node.js --async-context mode and the V8 Error.prepareStackTrace hook can capture the full async call chain. In production, use a distributed tracing library (OpenTelemetry, DataDog APM) that propagates trace context across async boundaries so you can reconstruct the full call path after the fact.

Deadlock Detection β€” Stuck Async Calls

A deadlock in async code isn't the same as a thread deadlock. It's subtler: two async operations each waiting for the other to finish, or a promise that depends on an event that will never fire because the code that fires it is itself waiting on the first promise. The detection strategy is simple: track the start time of every long-running async operation. If an operation hasn't progressed in more than N seconds (pick a threshold based on your p99 baseline), it's probably stuck. Alert on it. Log the operation name, the context, and ideally the async stack trace captured at creation time.

p99 Callback Latency

This is the time between when an async operation completes (the I/O is done, the promise is resolved) and when your callback actually runs. In a healthy event loop, this is microseconds. When the loop is saturated with microtasks, the gap grows β€” resolved promises pile up in the microtask queue behind dozens of other callbacks. Monitoring p99 callback latency (not just p50) catches the tail cases where users experience slow responses even though average latency looks fine.

The Async Golden Signals β€” traditional RED metrics (Rate, Errors, Duration) are necessary but not sufficient for async services. Add these five:

Concurrency β€” how many async operations are in flight right now (watch for steady growth = leak).
Loop Saturation β€” event loop lag in ms (watch for spikes above 10ms = CPU-bound work blocking I/O).
Silent Errors β€” unhandled promise rejection count (watch for any non-zero value = floating promises).
Tail Latency β€” p99 callback wait time, not just average (watch for creep = queue saturation).
Lifecycle Leaks β€” count of promises older than your SLA timeout (watch for non-zero = never-resolving promises).
Async services require a different observability model than synchronous ones. Event loop lag reveals when the single-threaded runtime is being blocked by CPU work β€” anything above 10ms deserves investigation. In-flight promise count tracks concurrency and catches memory leaks from promises that are created faster than they settle. Async stack traces require dedicated tooling (OpenTelemetry, async context propagation) because standard stack frames don't survive across await boundaries. Deadlock detection catches the rare but catastrophic case where two async operations wait on each other forever. p99 callback latency β€” not average β€” is what users actually experience when the microtask queue is saturated. Together these five signals cover the failure modes that standard CPU/memory/request-rate metrics completely miss.

Capacity & Concurrency Math

How to reason about what your async system can actually handle β€” before it falls over in production.

Most developers write async code correctly and then have no idea what throughput it can actually sustain. They guess, push to production, and find out the hard way at 2 a.m. This section gives you the math to estimate limits before you ship.

First, Get the Vocabulary Right β€” Concurrency vs Parallelism

These two words are used interchangeably β€” and they mean completely different things. Concurrency is about structure: multiple tasks are in-flight at the same time, but they may not be running simultaneously. Parallelism is about execution: multiple tasks are literally executing at the same instant on multiple CPU cores. You can have concurrency without parallelism β€” that's exactly what Node.js does.

Concurrency vs parallelism: concurrency = interleaving on one core during I/O; parallelism = simultaneous execution on multiple cores CONCURRENCY (1 core, many tasks) tasks interleave on one core β€” CPU does not sit idle during I/O Core A run B run C run A done B done C done A I/O in flight B+C I/O in flight 1 core stays busy the whole time β€” zero waste PARALLELISM (3 cores, 3 tasks) tasks literally run at the same instant β€” only works for CPU-bound work Core 1 Task A β€” running start to finish Core 2 Task B β€” running start to finish Core 3 Task C β€” running start to finish Best for: image encoding, matrix math, video processing NOT useful for I/O-bound work β€” cores still wait for the network

Why does this distinction matter? Because the right tool depends on which bottleneck you have. I/O-bound work (network requests, DB queries, file reads) is blocked by waiting β€” the CPU is idle. Concurrency with a single thread fixes this: while one request waits for the database, the event loop handles the next one. CPU-bound work (image processing, cryptography, JSON compression) burns actual CPU cycles β€” adding more concurrency on one core doesn't help because there are no idle gaps to fill. You need parallelism β€” multiple cores β€” for that.

The Worked Example β€” Event Loop Throughput Math

Let's work through a concrete Node.js scenario so the math becomes tangible.

Scenario: You have one Node.js process. Each incoming request spends about 4ms of CPU time on the event loop (parsing, logic, serialization) and waits 50ms for I/O (a database query). What is the maximum throughput this process can handle?

Step 1 β€” The event-loop ceiling. The event loop is single-threaded. If every request needs 4ms of CPU on the loop, the absolute ceiling is:

1,000ms Γ· 4ms per request = 250 requests/second β€” the event-loop ceiling

This is the hard upper bound. No matter what, a single Node.js process running this code cannot exceed 250 RPS. Why? Because the event loop can only run one callback at a time, and you've budgeted 4ms per callback. Exceeding 250 RPS means callbacks queue up faster than they're processed β€” you get event-loop lag, latency spikes, and eventually timeouts.

Step 2 β€” Concurrency during I/O wait. The 50ms I/O wait is where the magic happens. While request #1 is awaiting the database, the event loop is free to start requests #2, #3, #4... up to #12 (50ms Γ· 4ms = ~12 requests can be initiated during the I/O wait of one request). So at steady state, roughly 12–13 requests are in-flight simultaneously. This doesn't raise your ceiling (still 250 RPS), but it means your latency per request stays near 54ms even under load β€” the event loop isn't idle between I/O completions. Without async, a synchronous server would need 12+ threads to achieve the same concurrency.

Step 3 β€” What breaks the ceiling. If a request starts doing heavy CPU work on the event loop β€” say a 30ms JSON compression β€” that single callback blocks every other callback for 30ms. Suddenly your event-loop capacity drops from 250 RPS to 1,000ms Γ· 30ms = ~33 RPS. One slow synchronous operation can tank the whole process. This is why you never do CPU-heavy work on the Node.js main thread.

Escaping the Single-Thread Limit

Once you've hit the event-loop ceiling, you have three options:

Node.js ships with a libuv thread pool (default size: 4 threads) that handles inherently blocking operations β€” certain file system calls, DNS lookups, and native crypto. These operations run on a pool thread so they don't block the event loop. You can expand this pool via the UV_THREADPOOL_SIZE environment variable (max 1024, practical useful range around 8–16 for I/O-bound tasks). But this is a band-aid for operations that can't be made truly async at the OS level β€” it doesn't help with JavaScript CPU work.

Why it's limited: Pool threads are OS threads. Each thread uses stack memory (typically around 1–8 MB depending on OS defaults). At 128 threads you're burning substantial memory just for sleeping threads. Beyond the I/O ceiling, this approach doesn't scale.

Node.js Worker Threads (available since Node.js 10.5) let you run JavaScript in a separate OS thread with its own event loop and V8 isolate. Use them for CPU-bound tasks that would otherwise block the main event loop: image resizing, PDF generation, heavy JSON transformation, cryptography. Workers communicate with the main thread via message passing (postMessage/on('message')), similar to web workers in the browser.

When to reach for worker threads: If profiling shows a specific operation consuming more than ~5ms of CPU per request at your target load, move it to a worker. Never use worker threads as a default β€” the message-passing overhead and thread creation cost make them slower than the event loop for tiny fast tasks.

The simplest path to parallelism in Node.js is the built-in cluster module (or PM2 cluster mode): spawn N worker processes, one per CPU core. Each process has its own event loop and handles its share of requests. The OS load-balances connections across processes. With 8 cores, you multiply your event-loop ceiling by ~8: 250 RPS Γ— 8 = ~2,000 RPS on one machine.

Why it works: Each process is independent β€” no shared memory, no locks, no coordination overhead. If one worker crashes, the others keep running. The downside: if your application holds in-process state (in-memory caches, WebSocket connection maps), that state is now replicated across N processes and can diverge. Stateless services cluster perfectly; stateful services need external shared state (Redis, a database) before clustering.

Async scaling decision tree: event-loop lag β†’ worker threads; RPS ceiling β†’ cluster mode; I/O bound β†’ external DB/network is the bottleneck Scaling Decision β€” When You've Hit a Limit Hitting throughput limit? event-loop lag / slow callbacks Move CPU work to Worker Threads RPS ceiling on one process Cluster mode β€” one process per core high latency, low CPU I/O is the bottleneck β€” Node is fine, fix the DB / network Rule of thumb: profile first, classify as CPU-bound or I/O-bound, then pick the right lever Concurrency means tasks are in-flight simultaneously on one core (I/O-bound); parallelism means multiple cores run tasks at the same instant (CPU-bound). A Node.js event loop with 4ms CPU overhead per request can handle ~250 RPS β€” that's the hard ceiling. During 50ms I/O waits, 12+ requests can be in-flight simultaneously, keeping the core busy. To scale past the ceiling: use the libuv thread pool for inherently blocking I/O, worker threads for CPU-heavy tasks, and cluster mode (one process per core) for multiplying RPS on a single machine.

Q&A β€” Interview Style

Eight questions you'll actually be asked β€” with the reasoning chain that separates a solid answer from a great one.

These are the questions that separate engineers who've used async/await from engineers who understand it. For each one: think through your own answer first, then read.

Easy Q1 What is the difference between asynchronous and parallel?
Think first: is a Node.js server that handles 10,000 connections on one core running things in parallel?

Asynchronous is about structure β€” you start a task and don't block while it runs. The same thread can start another task in the meantime. Parallel is about execution β€” multiple tasks are literally computing at the same physical moment on multiple CPU cores.

A Node.js server handling 10,000 connections on one core is highly concurrent but completely sequential at any given microsecond β€” only one callback runs at a time. It wins through careful scheduling: while connection #1 is waiting for a DB reply, the loop handles connection #2. No parallelism is needed because the CPU isn't the bottleneck; network I/O is. Parallel execution would only help if requests were CPU-bound (e.g., each request compresses a video). Then you'd need multiple cores, because "while one core waits" doesn't apply β€” the core is burning cycles, not waiting.

Async vs parallel Venn: they are independent axes, not the same thing sync + parallel batch CPU on all cores async + parallel async worker pool sync + sequential one request at a time async + sequential Node.js event loop ← sync | async β†’ parallel ↑ seq ↓
Medium Q2 When would you NOT use async/await?
Think first: async/await is a beautiful API β€” is there ever a case where it actually makes things worse?

Four real situations where async/await is the wrong choice:

1. Fire-and-forget side effects you truly don't care about. If you want to log to a remote service and you do await logger.send(), you've now made your critical path wait for a logging side effect. Use a message queue or just call without await β€” but only if you genuinely don't care whether it succeeds.

2. CPU-bound tight loops. await-ing inside a tight loop introduces state-machine overhead on every iteration. If you're doing 50,000 iterations of a pure math loop, the overhead of suspending and resuming the state machine on each iteration outweighs any benefit. Keep tight CPU loops synchronous and move the entire loop to a worker thread if needed.

3. Synchronous helper functions with no async dependencies. Don't mark a function async just because its caller is async. It forces the return value into a Promise wrapper, adds overhead, and misleads readers into thinking I/O is happening. Only mark a function async if it contains await.

4. High-frequency streaming values. If you have a sensor sending 500 values/second, modeling each value as an await-able operation is awkward and wasteful. That's exactly the use case for reactive streams (RxJS Observable, Node.js Readable stream) β€” they're designed for sequences, not single values.

Medium Q3 Why does Node.js beat Apache on I/O-bound workloads with just one thread?
Think first: Apache uses multiple threads. Why would fewer threads be faster?

Apache's thread-per-request model means each concurrent connection holds an OS thread. An OS thread costs roughly 1–8 MB of stack memory and requires the OS scheduler to context-switch between threads β€” saving registers, flushing caches, loading the next thread's state. At a few hundred concurrent connections this overhead is manageable. At thousands of simultaneous connections, the server spends more time context-switching than actually handling requests.

Node.js's event loop sidesteps this entirely. There's one thread. When a request needs to wait for I/O, Node.js registers a callback with the OS (using non-blocking system calls: epoll on Linux, kqueue on macOS, IOCP on Windows) and immediately picks up the next request. No context switch. No stack memory per connection. The OS does the I/O in the background using DMA (the hardware copies data directly to memory without burning CPU cycles) and notifies the event loop when it's done. One thread can have thousands of I/O operations in-flight simultaneously because "in-flight" just means "registered with the OS, waiting for hardware."

Apache thread memory vs Node.js callback memory at 1000 concurrent connections Apache β€” 1,000 threads ~1–8 GB thread stack memory + scheduler context-switch overhead on each req most threads: sleeping, waiting for I/O Node.js β€” 1 thread + 1,000 callbacks ~MBs ← callbacks are just JS objects in the heap zero context switches β€” one thread, one run loop OS handles I/O asynchronously via epoll/kqueue Node wins on I/O-bound work because connections waiting on I/O cost almost nothing
Medium Q4 How does async/await work under the hood β€” what's the state machine?
Think first: if async/await is just syntax sugar, what does the compiler actually generate?

When you write async function getUser(id) { const row = await db.query(id); return row; }, the compiler (or runtime) rewrites it as a state machine class. The function's local variables become fields on the state machine object (so they survive across suspension points). Each await defines a state boundary: State 0 runs synchronously until the first await, then saves state and returns a Promise to the caller. When the awaited Promise resolves, the scheduler calls back into the state machine at State 1. From the caller's perspective, they got a Promise immediately and nothing blocked. From the state machine's perspective, it just woke up exactly where it left off.

This is why you can have thousands of suspended async functions in memory simultaneously β€” each one is just a small object on the heap holding a state index and a few local variables. No OS threads, no stacks. The cost per suspended function is proportional to the number of local variables it holds across the await point, typically a few hundred bytes.

async/await state machine: S0 β†’ S1 on I/O resolve β†’ S2 on function return State 0 sync code runs start I/O, save state I/O resolves State 1 resume, use result run code until next await return reached State 2 resolve outer Promise state machine discarded between states the thread is free β€” the state machine object just sits in the heap, waiting
Hard Q5 What is a floating promise and why is it dangerous?
Think first: what happens if you call an async function but don't await it and don't chain a .catch()?

A floating promise is a Promise that is neither awaited nor stored anywhere β€” it's just created and dropped. Example: sendEmail(user); inside an async function, where sendEmail returns a Promise. The Promise is created, starts executing, and then the reference is lost. The calling code moves on.

This is dangerous for three reasons. First, errors are silently swallowed. If sendEmail rejects, there's no .catch() and no await to surface the error. In Node.js, this fires an unhandledRejection event β€” which by default (since Node.js 15) crashes the process. Before Node.js 15, it was just silently swallowed, which was even worse. Second, you lose the ability to know when it finished. If your test ends before the floating promise completes, it may write to the database after the test teardown runs, corrupting the next test. Third, resource cleanup is impossible. If you're using structured concurrency, a floating promise is an orphaned task that doesn't belong to any scope β€” it can outlive a request, hold connections open, or write to stale state.

The fix: always either await the promise, .catch() it explicitly, or pass it to a "fire-and-forget" utility that at minimum logs rejections. The linting rule @typescript-eslint/no-floating-promises catches these statically.

Hard Q6 async/await vs reactive streams β€” when do you use which?
Think first: what's different about a WebSocket stream vs a single HTTP request that would make async/await awkward?

The fundamental difference is cardinality: async/await models a single value in the future. A reactive stream (Observable, Node Readable, AsyncGenerator) models a sequence of values over time. Use the right abstraction for the right cardinality.

Reach for async/await when: you're asking a question and expecting one answer (HTTP request β†’ response, DB query β†’ result set, file read β†’ buffer). The flow is linear: start, wait, continue.

Reach for reactive streams when: the source emits multiple values and you need to process them as they arrive (WebSocket messages, Kafka topic consumption, mouse events, real-time sensor data, paginated API cursor you're walking). Reactive streams also add backpressure β€” the ability to signal to the producer "slow down, I'm not keeping up." Without backpressure, a fast producer fills memory and eventually OOMs the consumer. async/await has no concept of backpressure on its own.

A common mistake: using async/await in a loop to simulate streaming. for await...of (AsyncGenerator) is the bridge β€” it lets you write async/await-style code over an asynchronous sequence while naturally handling backpressure by only requesting the next value when your loop body is ready for it.

Medium Q7 How do you handle errors when one promise in Promise.all rejects?
Think first: if you fire off 5 API calls in parallel and one fails, what happens to the other 4?

Promise.all implements fail-fast semantics: the moment any input promise rejects, the returned promise rejects immediately with that error. The other in-flight promises are not cancelled (JavaScript has no built-in cancellation for plain Promises) β€” they still run to completion, but their results are ignored. You need to handle this explicitly if those operations have side effects (like database writes) that should be rolled back.

Option 1 β€” Catch individual promises before passing to Promise.all: wrap each promise with .catch(err => ({ error: err })) so none of them ever reject. Promise.all resolves with an array that may contain error objects; you inspect each entry. This is the "settled" pattern without the API sugar.

Option 2 β€” Promise.allSettled: Returns a promise that resolves with an array of outcome objects ({ status: 'fulfilled', value: ... } or { status: 'rejected', reason: ... }) when all input promises have settled (resolved or rejected). Use this when you want the result of every operation regardless of which ones failed β€” for example, sending emails to a list where some addresses bounce.

// Rejects immediately on first failure try { const [users, orders, inventory] = await Promise.all([ fetchUsers(), fetchOrders(), // if this rejects... fetchInventory() // this still runs, result is ignored ]); } catch (err) { // Only get here once β€” on the FIRST rejection console.error("At least one failed:", err); } // Waits for all β€” never rejects const results = await Promise.allSettled([ fetchUsers(), fetchOrders(), fetchInventory() ]); for (const result of results) { if (result.status === "fulfilled") { process(result.value); } else { logError(result.reason); // handle each failure individually } }
Hard Q8 What does setTimeout(fn, 0) actually do, and where does it fit in the event loop?
Think first: if the delay is 0ms, does setTimeout fire immediately? Before or after a resolved Promise?

setTimeout(fn, 0) does NOT run immediately. It schedules fn in the macrotask queue (also called the timer queue). The event loop processes one macrotask per iteration. But before processing the next macrotask, it drains the entire microtask queue β€” and Promise.resolve().then() callbacks live in the microtask queue.

This means: all resolved Promise callbacks run before the setTimeout(fn, 0) callback, even if the Promise resolved after the setTimeout was registered. The microtask queue always has higher priority than the macrotask queue within a single event-loop tick.

Event loop iteration: macrotask β†’ drain microtasks fully β†’ (optional: render) β†’ next macrotask One Event Loop Tick 1. Macrotask setTimeout / I/O cb 2. Drain Microtasks ALL Promise.then / queueMicrotask 3. Next Macrotask next setTimeout / next I/O cb loop repeats setTimeout(fn, 0) β†’ macrotask; Promise.then β†’ microtask. Microtasks always run BEFORE the next setTimeout fires.

Practical consequence: if a resolved Promise's callback queues another microtask, and that one queues another, the event loop won't touch the setTimeout callback until the entire microtask chain is drained. An infinite microtask loop (a Promise that immediately resolves another Promise) will starve the macrotask queue and freeze the event loop, just like an infinite synchronous loop.

Eight core interview questions on async patterns β€” from the async-vs-parallel distinction (concurrency = structure, parallelism = simultaneous execution) through the state machine behind async/await, floating promise dangers, Promise.all error semantics, and the event loop's microtask-before-macrotask ordering. Each answer includes a WHY chain so you can reason through novel variations rather than reciting memorized answers.

Practice Exercises

Four hands-on challenges β€” spot bugs, trace outputs, make architecture calls, and build a real utility.

Reading about async patterns is one thing; using them under pressure is another. These exercises are designed so that each one catches a mistake developers actually make in production. Try to answer each one before reading the solution.

The following function fetches product details for a shopping cart. It works correctly but is painfully slow. What's wrong and how would you fix it?

async function fetchCartProducts(cartItemIds) { const products = []; for (const id of cartItemIds) { const product = await fetchProduct(id); // one at a time products.push(product); } return products; } How many network round-trips does this code make if the cart has 10 items? Could they happen simultaneously instead of sequentially? The bug: Sequential await inside a for loop forces each fetch to complete before the next one starts. For 10 items each taking 80ms of network latency, this takes 800ms. The fixes depend on whether ordering or concurrency control matters. // Best when order doesn't matter AND all IDs are available upfront async function fetchCartProducts(cartItemIds) { // Fire all fetches simultaneously β€” 80ms total instead of 800ms return Promise.all(cartItemIds.map(id => fetchProduct(id))); } // WHY this works: Promise.all returns a single promise that resolves // when ALL input promises resolve. All fetches run in parallel // because nothing awaits them one at a time. // Better when the server has rate limits β€” cap concurrency to N async function fetchCartProducts(cartItemIds, concurrency = 5) { const results = []; for (let i = 0; i < cartItemIds.length; i += concurrency) { const batch = cartItemIds.slice(i, i + concurrency); const batchResults = await Promise.all(batch.map(id => fetchProduct(id))); results.push(...batchResults); } return results; } // WHY: firing 500 simultaneous requests can overwhelm a server or // hit rate limits. Batching caps in-flight requests at `concurrency` // while still parallelizing within each batch.

Predict the exact output order of the following code. Write down your answer before running it.

console.log("A"); setTimeout(() => console.log("B"), 0); Promise.resolve() .then(() => console.log("C")) .then(() => console.log("D")); console.log("E"); Which statements are synchronous? Which ones schedule work? What's the difference between where setTimeout callbacks land vs Promise callbacks? Output: A, E, C, D, B

Here's the reasoning step by step:

  • A β€” synchronous, runs immediately.
  • setTimeout(fn, 0) β€” schedules B as a macrotask. Does not run yet.
  • Promise.resolve().then(...) β€” schedules C as a microtask. Does not run yet.
  • E β€” synchronous, runs immediately.
  • Current call stack is now empty. Event loop checks: are there microtasks? Yes.
  • C β€” microtask runs. Its .then() schedules D as another microtask.
  • D β€” microtask runs (microtask queue is drained before macrotasks).
  • Microtask queue empty. Event loop picks up the next macrotask.
  • B β€” setTimeout callback runs as a macrotask.

The key rule: microtasks (Promise callbacks) always drain completely before the next macrotask (setTimeout callback) runs β€” no matter how many microtasks are chained.

For each scenario, decide which pattern best fits: (A) async/await, (B) reactive streams (Observable / AsyncGenerator), or (C) worker threads. Explain your reasoning.

  1. Your API endpoint receives a user ID, fetches their profile from a database, and returns a JSON response.
  2. Your app receives live GPS coordinates from a connected vehicle at 10 updates per second and displays them on a map while also throttling to avoid re-rendering more than twice per second.
  3. Your server must generate a PDF report from 50 pages of data. Users report the API times out when many users request reports simultaneously.
For scenario 2, think about whether async/await can express "give me values as they arrive" and "only process two per second." For scenario 3, what's the bottleneck β€” I/O or CPU?

Scenario 1 β†’ A (async/await). Classic single-request/single-response pattern. One future value, linear flow. const user = await db.findById(id) is exactly what async/await is made for. No streaming, no CPU pressure.

Scenario 2 β†’ B (reactive streams). The source emits a sequence (10 GPS updates/second) and you need to apply an operator (throttle/debounce to 2/second). Reactive streams handle both naturally: gpsStream$.pipe(throttleTime(500)).subscribe(renderOnMap). Trying to do this with async/await requires building your own throttle logic on top of a manual async generator β€” you're reinventing the reactive operator model.

Scenario 3 β†’ C (worker threads). PDF generation from 50 pages of data is CPU-bound (DOM-to-PDF rendering, font layout, image compression). The event loop is blocking on CPU work, which is why the API times out under concurrent load. Move the PDF generation into a worker thread (or a worker pool) so the event loop stays free and requests can still be accepted while reports are being rendered.

Design a utility function withRetry(fn, options) that:

  • Retries a failing async function up to maxAttempts times
  • Uses exponential backoff between retries: wait baseDelayMs * 2^(attempt - 1) before each retry
  • Includes a simple circuit breaker: if more than failureThreshold consecutive calls fail, the circuit "opens" and subsequent calls immediately throw without calling fn β€” until a resetAfterMs timeout passes

Write the code. Then explain why the circuit breaker and the retry logic are solving different problems β€” one without the other is insufficient.

Start with just the retry + backoff. Then add a closure (or a class) that holds circuit state. The circuit needs to know: how many consecutive failures, and when the last failure happened. // ─── delay helper ──────────────────────────────────────────── const delay = ms => new Promise(resolve => setTimeout(resolve, ms)); // ─── withRetry β€” exponential backoff ───────────────────────── async function withRetry(fn, { maxAttempts = 3, baseDelayMs = 100, onRetry = () => {} // optional callback for logging } = {}) { for (let attempt = 1; attempt <= maxAttempts; attempt++) { try { return await fn(); // success β†’ return immediately } catch (err) { if (attempt === maxAttempts) throw err; // exhausted, re-throw const wait = baseDelayMs * Math.pow(2, attempt - 1); onRetry({ attempt, wait, err }); await delay(wait); // attempt 1: wait baseDelayMs (e.g. 100ms) // attempt 2: wait 2x (200ms) // attempt 3: wait 4x (400ms) } } } // ─── Circuit Breaker factory ────────────────────────────────── function createCircuitBreaker({ failureThreshold = 3, // open after this many consecutive failures resetAfterMs = 10_000 // try again after this timeout } = {}) { let consecutiveFailures = 0; let openedAt = null; // timestamp when circuit opened, or null if closed return { async call(fn) { // If circuit is open, check if reset window has passed if (openedAt !== null) { if (Date.now() - openedAt < resetAfterMs) { throw new Error("Circuit open β€” fast-failing to protect downstream"); } // Reset window passed β†’ move to half-open (try once) openedAt = null; consecutiveFailures = 0; } try { const result = await fn(); consecutiveFailures = 0; // success resets the count return result; } catch (err) { consecutiveFailures++; if (consecutiveFailures >= failureThreshold) { openedAt = Date.now(); // trip the circuit } throw err; } } }; } // ─── Combined: retry INSIDE the circuit breaker ────────────── // The CB wraps the whole retry attempt-set (not each individual try) // so repeated retries against a dead service don't keep the circuit // from opening. const breaker = createCircuitBreaker({ failureThreshold: 3, resetAfterMs: 15_000 }); async function callPaymentService(payload) { return breaker.call(() => withRetry(() => fetch("/api/payment", { method: "POST", body: JSON.stringify(payload) }), { maxAttempts: 3, baseDelayMs: 200 }) ); }

Why you need both: Retry handles transient failures β€” a brief network blip, a momentary DB overload spike. It's optimistic: "the service is probably fine, try again in a moment." Circuit breaker handles sustained outages β€” the payment service is down for 10 minutes. Retrying against a dead service wastes your caller's time (each caller waits for 3 retry timeouts before failing), burns your own connection pool, and hammers the already-failing downstream service making its recovery harder. The circuit breaker makes the system fail fast and cheap while the downstream service recovers, then automatically resumes once the reset window passes.

Four exercises covering the most common async mistakes: sequential awaits that should be parallel (fix with Promise.all or batched concurrency), event-loop ordering surprises (microtasks always drain before macrotasks), pattern selection by bottleneck type (async/await for single values, reactive for streams, workers for CPU), and a production-ready retry-with-circuit-breaker utility that shows why retry and circuit breaking solve different failure modes and belong together.

Cheat Sheet β€” Async Patterns at a Glance

Eight quick-reference cards covering every core pattern and rule from this page. Pin this tab when you're reviewing before an interview.

Each card is a one-sentence rule you should be able to recite and explain. If you can say the "Why" aloud for each one, you're ready.

Pass a function to be called when the result is ready. Simple but nests badly β€” "callback hell" hits when you chain 3+ async steps, because each callback wraps the next one, pushing code rightward off the screen. A Promise is an object that represents a value you don't have yet β€” you attach .then() / .catch() to it. It fixes callback hell by making async chains flat instead of nested, and by separating error handling into one .catch() at the end. Syntax sugar over Promises β€” write async code that reads like synchronous code without blocking the thread. Under the hood the compiler rewrites your function as a state machine; each await is a suspension point where the thread is released back to the pool. Sequential await (one after another) takes time A + B + C. Parallel with Promise.all([A, B, C]) takes max(A, B, C). Use parallel when the calls are independent. Use sequential only when call B needs the result of call A. Wrap await calls in try/catch β€” or chain .catch() on the Promise. Never let a rejection go unhandled. Promise.allSettled waits for all results (fulfilled or rejected); Promise.all fails fast on the first rejection. A floating promise is one you called but didn't await and didn't .catch(). It's dangerous because errors are silently swallowed and cleanup code runs before the operation finishes. Always either await it, or explicitly .catch() it if you truly want fire-and-forget. One thread. One task at a time. While a task waits for I/O, the thread handles other tasks. Microtasks (Promise callbacks) drain entirely before the next macrotask (setTimeout callback). Never do heavy CPU work on the event loop β€” it blocks every other request. When a producer emits data faster than a consumer can process it, the buffer fills and memory blows up. Backpressure is the mechanism that lets the consumer signal "slow down." Reactive streams (RxJS, Node Readable) have built-in backpressure. Plain Promises and async/await have none β€” you must implement it manually (e.g., bounded concurrency with a semaphore). Eight one-card rules: callbacks nest badly β†’ Promises flatten chains β†’ async/await reads like sync code β†’ sequential vs parallel changes total time β†’ always handle rejections β†’ floating promises swallow errors silently β†’ the event loop's microtask queue drains before macrotasks β†’ backpressure prevents memory blow-up from fast producers.

Glossary β€” Key Terms in Plain English

Every technical term from this page defined in the way you'd explain it to a smart friend who doesn't write code for a living.

These aren't dictionary definitions β€” they're the mental models that make the concepts click. Read the plain-English version first; the technical precision follows.

Asynchronous
Starting a slow job and moving on to other work while it runs β€” instead of standing there waiting. Your code says "start the database query, and when you have an answer call me back." The opposite of synchronous (standing in line until it's your turn).
Synchronous
Doing one thing, waiting for it to finish completely, then starting the next thing. Simple to reason about but wasteful whenever steps involve waiting β€” the thread just idles while the disk, network, or database thinks.
Callback
A function you hand to another function and say "run this when you're done." The original async pattern in JavaScript. Works fine for one step; becomes a nightmare of nested indentation ("callback hell") when you chain several async steps together.
Promise
An object that acts as a placeholder for a value that doesn't exist yet. It will eventually settle into one of two states: fulfilled (the value arrived) or rejected (an error happened). You attach handlers with .then() and .catch(). Promises chain flat instead of nesting β€” fixing the core readability problem with callbacks.
Future
The same concept as a Promise, just the name used in other languages (Java, Scala, C++, Rust calls it a Future too). They represent a value that will be available at some point in the future.
async / await
A syntax that lets you write code that works asynchronously but reads as if it were synchronous. await tells the runtime "pause this function here, let other work run, resume when this Promise resolves." No blocking β€” the thread is released while it waits.
Event Loop
The heartbeat of a JavaScript runtime. It's a loop that constantly checks: "Is there a callback ready to run?" It runs one callback at a time β€” but since I/O callbacks only fire when the I/O is done, the loop can juggle thousands of in-flight requests on a single thread without any of them blocking the others.
Microtask
A small job that the event loop runs immediately after the current task finishes β€” before it processes the next macrotask (like a setTimeout). Promise .then() callbacks are microtasks. They always run before the next timer fires, even if the timer was registered first.
Macrotask
A regular event-loop task: a setTimeout/setInterval callback, an I/O completion callback, or a UI event. The event loop processes one macrotask, then drains all queued microtasks, then processes the next macrotask. This ordering is why Promise.resolve().then(fn) runs before setTimeout(fn, 0).
Non-blocking I/O
A style of I/O where starting an operation (reading a file, making a network request) does not freeze the calling thread. The OS handles the I/O in the background and notifies the program via a callback or event when the data is ready. Node.js is built entirely on non-blocking I/O through its libuv library.
Backpressure
The ability of a slow consumer to tell a fast producer "hold on, I'm not ready for more data." Without backpressure, a producer that sends data faster than the consumer can handle will fill up buffers until memory runs out. Reactive streams build backpressure in; plain Promises don't β€” you have to implement it yourself with a semaphore or a bounded queue.
Observable / Reactive Stream
A sequence of values that arrive over time, treated like a collection you can map, filter, merge, and throttle. Think of it as a Promise that can emit many values instead of just one. RxJS Observables and Node.js Readable streams are the most common implementations. Essential when the source is a live data feed (WebSockets, sensors, user events) rather than a single request/response.
Floating Promise
A Promise you created but didn't attach any error handler to and didn't await. It's "floating" because nobody holds a reference to its outcome. If it rejects, the error is silently discarded β€” or, in Node.js 15+, it crashes the process with an unhandled rejection. Always either await a Promise or explicitly .catch() it.
Concurrency
Multiple tasks are in-flight at the same time, but they may not be running at the exact same instant. A Node.js event loop is highly concurrent β€” it has thousands of requests in-flight β€” but runs only one callback at any given moment on one thread. Concurrency is about structure; parallelism is about simultaneous physical execution.
Bounded Concurrency
Capping how many async operations can be in-flight at the same time. Instead of firing all 500 requests simultaneously (which overwhelms the target server or exhausts your connection pool), you allow at most N in-flight at once. Typically implemented with a semaphore: acquire a slot before starting an operation, release it when done.
Fifteen terms defined in plain English: async/sync (the core idea), callback/Promise/Future/async-await (the evolution of single-value async), event loop/microtask/macrotask (the scheduling model), non-blocking I/O (the OS mechanism), backpressure/Observable (streaming extensions), and floating promise/concurrency/bounded concurrency (the common gotchas).

Mini-Project β€” Parallel Image Thumbnail Pipeline

Build a real pipeline that fetches images, generates thumbnails, and uploads results β€” growing from a naive sequential version to a production-ready bounded-concurrency design with backpressure.

The best way to cement async patterns is to build something that breaks if you get them wrong. This project does exactly that: start with a sequential version that works correctly but fails under load, then evolve it through three stages until it's ready for production traffic. Each stage introduces one new concept and shows concretely why it's needed.

What You're Building

An image thumbnail pipeline: given a list of image URLs, fetch each image, generate a 200Γ—200 thumbnail (simulated here as a resize operation), and upload the result to a storage service. You need to handle 100–10,000 images, and you need it to be fast, safe under load, and resilient to partial failures.

Thumbnail pipeline evolution: sequential β†’ all-parallel β†’ bounded concurrency β†’ backpressure streaming Pipeline Evolution β€” Each Stage Fixes the Previous Stage's Failure Mode Stage 1 β€” Sequential for (const url of urls) { await fetch(url) await resize(img) await upload(thumb) } βœ— 1,000 images Γ— 500ms = 8+ min fix: parallel Stage 2 β€” All Parallel await Promise.all( urls.map(process) ) βœ— 10k simultaneous requests β†’ OOM / rate limit / socket exhaustion fix: limit Stage 3 β€” Bounded semaphore = new Semaphore(8) await semaphore.run( () => process(url) ) βœ“ max 8 in-flight, safe for any N fix: stream Stage 4 β€” Streaming for await (const url of urlSource) { await processWithLimit } βœ“ backpressure: reads next URL only when ready to process it Correct but slow Fast but explodes at scale Fast + safe for large batches Fast + safe + memory-bounded Each stage is a response to a specific failure mode β€” understand the failure first, then the fix makes sense.

Stage 1 β€” Sequential (Correct but Slow)

Start here. The goal is a working pipeline before you optimize. Each image goes through three steps β€” fetch, resize, upload β€” and you don't move to the next image until the previous one finishes. Simple, easy to debug, completely safe. Also about 8 minutes for 1,000 images at 500ms per image.

// Stage 1: Sequential β€” correct, debuggable, slow // Each image blocks the next: total time = N Γ— (fetchMs + resizeMs + uploadMs) async function processThumbnails(imageUrls) { const results = []; for (const url of imageUrls) { // fetch β†’ resize β†’ upload, one at a time const imageData = await fetchImage(url); // ~200ms network const thumbnail = await generateThumbnail(imageData, { width: 200, height: 200 }); // ~100ms CPU (simulated) const storageUrl = await uploadThumbnail(thumbnail, url); // ~200ms network results.push({ original: url, thumbnail: storageUrl }); console.log(`Done: ${url}`); } return results; } // ── Helpers (simulated) ────────────────────────────────────────────── async function fetchImage(url) { await delay(200); // simulate network fetch return Buffer.alloc(50_000); // fake 50KB image } async function generateThumbnail(data, size) { await delay(100); // simulate CPU-bound resize (in reality: use worker thread) return Buffer.alloc(5_000); // fake 5KB thumbnail } async function uploadThumbnail(data, originalUrl) { await delay(200); // simulate upload return `https://cdn.example.com/thumbs/${encodeURIComponent(originalUrl)}`; } const delay = ms => new Promise(resolve => setTimeout(resolve, ms)); // Stage 2: All parallel β€” fast for small lists, dangerous for large ones // WARNING: firing 10,000 simultaneous fetch() calls will: // - exhaust your OS socket limit (~65,535 sockets on most systems) // - hit rate limits on the image source server (HTTP 429) // - fill your heap with all 10,000 in-flight image buffers simultaneously // β€” peak memory = N Γ— average_image_size async function processThumbnails(imageUrls) { // All promises created simultaneously β€” all fetches fire at once return Promise.all(imageUrls.map(async (url) => { const imageData = await fetchImage(url); const thumbnail = await generateThumbnail(imageData, { width: 200, height: 200 }); const storageUrl = await uploadThumbnail(thumbnail, url); return { original: url, thumbnail: storageUrl }; })); } // WHY this fails at scale: // Promise.all starts ALL promises before any of them resolves. // With 10,000 URLs, you have 10,000 fetch() calls initiated simultaneously. // Each in-flight request holds a socket, a buffer, and heap memory. // Your machine or the target server will refuse connections long before they all complete. // Stage 3: Bounded concurrency β€” fast AND safe for any N // Uses a semaphore to cap the number of in-flight operations. // "At most 8 images being processed simultaneously β€” no more." class Semaphore { constructor(limit) { this._limit = limit; this._active = 0; this._queue = []; // waiting callers } run(fn) { return new Promise((resolve, reject) => { const tryRun = () => { if (this._active < this._limit) { this._active++; Promise.resolve() .then(() => fn()) // run the actual work .then(resolve, reject) // forward result or error .finally(() => { this._active--; if (this._queue.length > 0) { this._queue.shift()(); // wake next waiter } }); } else { this._queue.push(tryRun); // park until a slot opens } }; tryRun(); }); } } async function processThumbnails(imageUrls, { concurrency = 8 } = {}) { const sem = new Semaphore(concurrency); const results = await Promise.all( imageUrls.map(url => sem.run(async () => { const imageData = await fetchImage(url); const thumbnail = await generateThumbnail(imageData, { width: 200, height: 200 }); const storageUrl = await uploadThumbnail(thumbnail, url); return { original: url, thumbnail: storageUrl }; }) ) ); return results; } // WHY 8? // Tune concurrency empirically: start at 8, measure throughput + error rate. // If the image server starts returning 429s, lower it. // If your CPU and network are both underutilized, raise it. // There is no universal magic number β€” profile for your target server. // Stage 4: Async generator + bounded concurrency // Best for very large lists (millions of URLs from a database cursor or S3 listing) // because the URL list itself is never fully loaded into memory. // Backpressure: the generator only yields the next URL when the pipeline is ready. async function* urlSource(urls) { for (const url of urls) { yield url; // in production: this could be a DB cursor or S3 paginator } } async function processThumbnails(imageUrlsIterable, { concurrency = 8 } = {}) { const sem = new Semaphore(concurrency); // same Semaphore from Stage 3 const results = []; const pending = []; for await (const url of urlSource(imageUrlsIterable)) { // sem.run() will park here if 8 operations are already in flight // β€” that's the backpressure: the for-await loop doesn't advance // until a slot opens, so we never load more URLs than we can process const task = sem.run(async () => { const imageData = await fetchImage(url); const thumbnail = await generateThumbnail(imageData, { width: 200, height: 200 }); const storageUrl = await uploadThumbnail(thumbnail, url); return { original: url, thumbnail: storageUrl }; }); pending.push(task); } // Wait for all in-flight tasks to settle return Promise.all(pending); } // WHY this is better for huge lists: // Stage 3 calls imageUrls.map() β€” which creates ALL Promise wrappers upfront. // For 1,000,000 URLs, that's 1 million Promise objects in memory before any work starts. // Stage 4's for-await loop only processes the NEXT url when the semaphore has a free slot β€” // so the maximum number of URL objects in memory at any time is the concurrency limit.

What You Learn From Each Stage

Stage 1 teaches you the pattern clearly β€” fetch, resize, upload. The logic is right; only the performance is wrong. Stage 2 shows you why "just use Promise.all" is incomplete advice: it's correct for 10 items and dangerous for 10,000. Stage 3 introduces the semaphore, which is the standard solution to the "bounded concurrency" problem you'll encounter in every real-world pipeline. Stage 4 completes the picture with backpressure: not just limiting how many items process simultaneously, but limiting how many items are even loaded into memory at once β€” essential for pipelines that read from a database cursor or a paginated API.

Where to take this next: In production, the resize step (Stage 3/4's generateThumbnail) should run in a Worker Thread, not on the event loop. CPU-bound work on the main Node.js thread blocks all other requests for the duration of the resize. Add a worker pool (e.g., workerpool npm package or Node's built-in worker_threads with a custom pool) and the pipeline scales to both I/O-bound and CPU-bound bottlenecks. Four-stage pipeline evolution: sequential (correct, slow) β†’ Promise.all (fast, OOMs at scale) β†’ bounded concurrency with a semaphore (fast and safe) β†’ async generator with backpressure (memory-bounded for unlimited-size inputs). Each stage responds to a concrete failure mode β€” understanding the failure first is what makes the fix obvious rather than mysterious.

Migration Path β€” Callback-Heavy Node.js to async/await

A four-step guide for modernizing a production codebase β€” with an honest risk assessment at each step so you know what you're getting into before you start.

Many real Node.js codebases were written before Promises were widespread β€” they use callback-style APIs throughout: fs.readFile(path, callback), db.query(sql, callback), custom event emitters. Migrating to async/await is worth doing β€” the code becomes dramatically easier to read and error handling becomes consistent. But doing it all at once in a big-bang rewrite is how you break production. The four steps below are designed to be done incrementally, with each step fully tested before the next begins.

Before you start: You need good test coverage on the existing callback code. If you don't have tests, write them first β€” against the current behavior, not the migrated behavior. These tests are your safety net. Without them, you'll migrate code and have no way to tell if you broke something subtle.

Step 1 β€” Promisify the Leaf Functions (Risk: Low)

What you do: Find every function at the bottom of your call tree that takes a Node-style callback ((err, result) => void) and wrap it with util.promisify() or an explicit Promise wrapper. These are your "leaf" functions β€” file reads, DB queries, HTTP requests. Don't touch any of the calling code yet; just make the leaves return Promises.

// Old style: Node error-first callback function findUser(id, callback) { pool.query( 'SELECT * FROM users WHERE id = $1', [id], (err, result) => { if (err) return callback(err); callback(null, result.rows[0]); } ); } // Step 1: wrap in Promise β€” callers can use either callbacks OR .then() // The internal pool.query callback hasn't changed at all. function findUser(id) { return new Promise((resolve, reject) => { pool.query( 'SELECT * FROM users WHERE id = $1', [id], (err, result) => { if (err) return reject(err); resolve(result.rows[0]); } ); }); } // Or for Node built-ins, use util.promisify: // const { promisify } = require('util'); // const readFile = promisify(fs.readFile); Risk: Low. The internal implementation doesn't change β€” you're only adding a Promise wrapper around the existing callback. The only risk is subtle: if the old function could call its callback multiple times (some poorly-written async utilities do this), wrapping it in a Promise silently discards the second call. Check your callback functions for multiple invocations before wrapping.

Verification: Run your existing tests against the promisified versions. They should pass unchanged because the same values are returned β€” just now via Promise resolution instead of callback invocation.

Step 2 β€” Convert Middleware and Route Handlers (Risk: Medium)

What you do: Now that your leaf functions return Promises, convert the Express/Koa/Fastify middleware and route handlers that call them to use async/await. These are the functions that receive a request and compose several database or service calls.

// Nested callbacks β€” hard to follow, error handling is easy to miss router.get('/users/:id', (req, res, next) => { findUser(req.params.id, (err, user) => { if (err) return next(err); findOrders(user.id, (err, orders) => { if (err) return next(err); res.json({ user, orders }); }); }); }); // async/await β€” reads top-to-bottom, one try/catch handles all errors router.get('/users/:id', async (req, res, next) => { try { const user = await findUser(req.params.id); const orders = await findOrders(user.id); res.json({ user, orders }); } catch (err) { next(err); // Express error handler takes it from here } }); // IMPORTANT: Express 4 does not automatically catch async errors. // You must either wrap in try/catch (above) or use a wrapper utility: // router.get('/users/:id', asyncHandler(async (req, res) => { ... })); Risk: Medium. The #1 mistake here is forgetting the try/catch in Express route handlers. If an async function throws and you don't catch it, Express 4 never sees the error β€” the request hangs indefinitely. Either wrap every handler in try/catch, or install the express-async-errors package (which patches Express to automatically forward unhandled async rejections to next(err)). Test each route under error conditions explicitly after migrating.

Order of migration: Start with low-traffic, non-critical routes (admin endpoints, internal health checks). Let them run in production for a week before migrating high-traffic customer-facing routes. One route at a time, not all at once.

Step 3 β€” Migrate Service-Layer and Business Logic Functions (Risk: Medium)

What you do: Convert the functions between your route handlers and your leaf-level DB/HTTP functions β€” the service layer, domain logic, and utility functions that orchestrate multiple operations. By this point your leaf functions already return Promises (Step 1), so converting the middle layer is mostly mechanical.

function processCheckout(cartId, userId, callback) { getCart(cartId, (err, cart) => { if (err) return callback(err); chargeCard(userId, cart.total, (err, chargeId) => { if (err) return callback(err); createOrder(userId, cart, chargeId, (err, order) => { if (err) { // Should we refund the charge? callback-style makes this easy to forget return callback(err); } callback(null, order); }); }); }); } async function processCheckout(cartId, userId) { const cart = await getCart(cartId); const chargeId = await chargeCard(userId, cart.total); try { const order = await createOrder(userId, cart, chargeId); return order; } catch (err) { // Compensating transaction β€” refund if order creation fails // This is easy to add because the code is linear and readable await refundCharge(chargeId).catch(refundErr => logger.error('Refund failed after order error', { chargeId, refundErr }) ); throw err; // re-throw so the caller knows checkout failed } } Risk: Medium. Two subtle issues to watch for. First, if you convert a function that was used both with callbacks AND as a Promise (mixed usage), you'll break the callback callers. Search for all call sites before converting. Second, watch for places where the callback-style code was intentionally fire-and-forget (called without checking the callback result). When you convert these, don't accidentally add await β€” that would change the semantics from "start and forget" to "start and wait."

Step 4 β€” Remove Callback Compatibility Shims and Tighten Error Handling (Risk: Low)

What you do: Once every call site has been migrated to async/await, remove the old callback-style shims and compatibility wrappers. Update your error handling to use a consistent global error handler. Turn on the ESLint rules that prevent regressions.

{ "rules": { // Catches floating promises β€” the most common async mistake post-migration "@typescript-eslint/no-floating-promises": "error", // Prevents adding unnecessary async to functions that don't await anything "@typescript-eslint/require-await": "warn", // Prevents mixing await and .then() chaining in the same function // (confusing, easy to get ordering wrong) "no-return-await": "error" } } Global unhandled rejection handler (Node.js): Add this once at application startup. After migration, there should be zero unhandled rejections in production β€” this catches any that slip through and logs them with enough context to find and fix them. process.on('unhandledRejection', (reason, promise) => { logger.error('Unhandled Promise rejection', { reason, // Don't crash here immediately β€” log it, alert, then decide. // After migration, any unhandledRejection is a bug in your code. }); // In production: increment a counter metric, alert the on-call engineer. // During migration: treat each occurrence as a bug to fix before the next deploy. }); Risk: Low. Removing the callback shims is safe if Step 3 is complete and all call sites are migrated. The risk is missing a call site β€” grep for the old function names before deleting any shim. Use git grep 'findUser(' -- '*.js' to find every usage across the codebase. If you find any remaining callback-style callers, migrate them before removing the shim.

When you're done: Your codebase should have zero callback-style functions in application code, consistent try/catch error handling throughout, ESLint rules that prevent regressions, and a global unhandled rejection handler that surfaces any remaining gaps. The payoff is a codebase where a new engineer can read a request handler top-to-bottom and understand it β€” no more following a chain of nested callbacks through six files.

Four-step migration: (1) Promisify leaf functions β€” additive, low risk, run existing tests to verify. (2) Convert route handlers to async/await β€” medium risk, watch for missing try/catch in Express 4, migrate low-traffic routes first. (3) Convert service layer β€” medium risk, watch for mixed-usage functions and accidental await on fire-and-forget calls. (4) Remove callback shims and add ESLint guards β€” low risk if Step 3 is complete, grep every call site before deleting.

Further Reading β€” Sources Worth Your Time

Carefully selected references β€” each one teaches something this page doesn't have room to cover in depth.

These references go deeper on specific sub-topics. Listed in order of approachability β€” start from the top if you're still building your mental model, from the bottom if you want to dig into implementation details.

Source Author / Org Why It's Worth Reading
"What the heck is the event loop anyway?" β€” JSConf EU 2014 talk (free on YouTube) Philip Roberts The single clearest visual explanation of how the JavaScript event loop, call stack, callback queue, and Web APIs interact. Uses an animated visualizer (loupe) to show exactly what happens when setTimeout and Promises fire. If you've ever been confused by the event loop, watch this first β€” ~27 minutes, zero math, completely concrete.
"Tasks, microtasks, queues and schedules" β€” blog post (free at jakearchibald.com) Jake Archibald The definitive written reference for microtask vs macrotask ordering, with interactive step-by-step animations. Goes further than Philip Roberts' talk into the precise spec-defined order of execution. Essential reading for understanding why Promise.then() fires before setTimeout(fn, 0) and what that means for your code.
MDN Web Docs β€” "async function" and "Using Promises" Mozilla Developer Network The most accurate and up-to-date reference for async/await and Promise syntax, including the full list of Promise combinators (Promise.all, Promise.allSettled, Promise.race, Promise.any) with correct descriptions of their semantics. Free at developer.mozilla.org. Use this when you need to check the exact behavior of an edge case.
"JavaScript: The Definitive Guide" β€” Chapter 13: Asynchronous JavaScript David Flanagan (O'Reilly, 7th edition, 2020) The most thorough written treatment of async patterns in JavaScript β€” covers callbacks, Promises, async/await, and async generators with the depth and precision of a reference book but readable prose. Chapter 13 is self-contained; you don't need to read the rest of the book first. Best choice if you want one definitive source you can return to repeatedly.
Node.js Documentation β€” "The Node.js Event Loop" Node.js Foundation The official documentation explaining the six phases of the Node.js event loop (timers, pending callbacks, idle/prepare, poll, check, close callbacks) and the difference between process.nextTick() and setImmediate(). Essential reading before you optimize Node.js performance or debug mysterious ordering issues in production. Free at nodejs.org/en/docs/guides/event-loop-timers-and-nexttick.
RxJS Documentation β€” "Observable" RxJS Core Team (rxjs.dev) The best starting point for reactive streams in JavaScript β€” explains the Observer pattern, how Observables differ from Promises, and introduces the core operators (map, filter, switchMap, debounceTime). Work through the "Getting Started" guide before reading individual operator docs. Free at rxjs.dev.
Suggested reading order: Philip Roberts' talk (~27 min, builds the visual mental model) β†’ Jake Archibald's blog post (30 min, fills in the precise details) β†’ MDN Promises reference (10 min, bookmarks for ongoing use) β†’ Flanagan Chapter 13 (2–3 hours, the complete picture). Add the Node.js event loop guide and RxJS docs when you need them specifically. Six references: Philip Roberts' JSConf talk (visual event loop explanation), Jake Archibald's blog (microtask vs macrotask precision), MDN async/await + Promise docs (canonical reference), Flanagan Chapter 13 (complete written treatment), Node.js event loop guide (production Node specifics), and RxJS docs (reactive streams starting point). Together they take you from first intuition to production-level understanding.

Related Topics β€” What to Study Next

Six natural next steps in the HLD learning path β€” each one connects directly to something you just learned about async patterns.

Async patterns don't exist in isolation. Each of these topics extends a specific idea from this page β€” they're ordered from "most directly related" to "broadens the picture."

You've covered the full async patterns mental model. You understand why asynchronous code exists (threads are expensive; I/O is slow), the four-generation evolution from callbacks to reactive streams, when to use each pattern, how the event loop's microtask and macrotask queues are ordered, the math behind concurrency ceilings, how to migrate a callback-heavy codebase safely, and how to build a production-ready bounded-concurrency pipeline. The next natural step is Pub/Sub β€” taking these same ideas from in-process async code out to the distributed systems layer, where services communicate asynchronously across network boundaries. Six related topics: Pub/Sub (async patterns at the distributed systems layer), Message Queues (durable cross-service async communication), Apache Kafka (log-based high-throughput async streams), Webhooks (the simpler async notification pattern queues replace), Real-Time Systems (reactive streams paired with browser delivery), and Caching Strategies (async cache invalidation). The logical next stop is Pub/Sub.