Real-Time Communication

Section 1

TL;DR — The Mental Model

Mental Model: Imagine you're waiting for a package. You could (1) call the post office every 5 minutes asking "is it here yet?" — that's short polling. (2) Call once and stay on the line until they say "it's here!" — long polling. (3) Ask them to call YOU when it arrives — Server-Sent Events. (4) Set up a walkie-talkie so you can both talk whenever either side has something to say — WebSockets.

The normal web works on a simple deal: your browser asks a question, the server answers, and the conversation ends. You click a link, you get a page. You submit a form, you get a confirmation. That one-ask-one-answer cycle is called request-responseThe standard HTTP model — the client sends a request, the server sends back a response, and the connection closes. Like sending a letter and waiting for a reply. Every Google search and page load works this way., and it powers 99% of the web.

But some things can't wait for you to ask. When Alice sends a chat message, Bob needs to see it now — not the next time he refreshes. When a stock price changes, the trader's screen must update in milliseconds, not seconds. When your Uber driver turns a corner, the map should move instantly, not after you tap refresh. These are real-time problems, and solving them requires tricking HTTP into doing something it was never designed for — or replacing it entirely.

One-line rule: Need one-way server→client push? Use SSE. Need two-way interactive? Use WebSockets. Need maximum compatibility? Long polling. Just prototyping? Short polling. Start with the simplest approach that meets your latency needs.

Section 2

The Scenario — Building a Chat App

Think First

You're building a chat app. Alice sends a message. Bob needs to see it instantly. But HTTP is request-response — the server can only talk when the client asks a question. How would you get data from the server to Bob without Bob requesting it? Think of at least two different approaches before reading on.

Your team just got a new assignment: build a team chat application. Think a simpler version of Slack. The product requirement is clear — when Alice types a message, Bob should see it immediately. Not in 5 seconds. Not after refreshing the page. Instantly, the way a text message works on your phone. Messages must appear the moment they're sent.

You already know how the regular web works. The browser sends a request ("give me this page"), the server responds ("here you go"), and the connection closes. It's like ordering food at a counter — you ask, they hand it to you, transaction done. But chat is completely different. With chat, the server needs to push new messages to Bob the moment they arrive, even though Bob didn't ask for anything. The server needs to tap Bob on the shoulder and say "hey, Alice just said something."

This is the fundamental tension: HTTP was designed for the client to start every conversation. The server can never reach out to the browser unprompted. It's like a waiter who can only bring food when you order — they can't walk up and say "the chef just made something amazing, here you go." So how do we get around this?

Think First

You can only use normal HTTP (client asks, server answers). How would you build a chat app where messages appear instantly? What's the simplest trick you can think of?

Hint: What if the client just kept asking really, really often?

That "keep asking" idea is exactly where our journey begins. It's the first and dumbest approach — and understanding why it works (and where it falls apart) naturally leads us to each better solution.

Section 3

First Attempt — Just Keep Asking (Short Polling)

The simplest solution to "how does Bob get new messages?" is embarrassingly obvious: Bob's browser asks the server "any new messages?" over and over again, on a timer. Every 3 seconds, every 5 seconds — whatever interval you pick. If the server says "nope, nothing new," the browser waits and asks again. If the server says "yes, here's a new message from Alice," the browser shows it on screen.

This technique is called short pollingA technique where the client repeatedly sends HTTP requests at regular intervals (e.g., every 1-5 seconds) to check for new data. Simple to implement but wasteful — most requests return empty responses. Open Chrome DevTools → Network tab on any legacy chat app to see the repeated XHR requests., and it's the most basic form of "real-time." It works the same way a kid in the back seat keeps asking "are we there yet?" — annoying, wasteful, but eventually effective.

You can literally see this happening. Open Chrome DevTools on any app that uses short polling, click the Network tab, and watch the XHR requests fire every few seconds — most returning empty JSON arrays. It's enlightening and a little painful to watch.

See the problem? Alice sent her message at the 12-second mark, but Bob didn't find out until the 15-second mark — when his next poll fired. That's a 3-second delay. And 3 out of 4 requests came back completely empty. Those were wasted trips — wasted bandwidth, wasted CPU, wasted database queries that produced nothing.

short-polling.js — the simplest (worst) approach

// Short polling: ask for new messages every 5 seconds
// Open Chrome DevTools → Network tab to watch these fire
function startPolling() {
  let lastMessageId = 0;

  setInterval(async () => {
    // Every 5 seconds, ask the server "anything new?"
    const response = await fetch(`/api/messages?since=${lastMessageId}`);
    const messages = await response.json();

    if (messages.length > 0) {
      // Got new messages! Show them on screen
      messages.forEach(msg => displayMessage(msg));
      lastMessageId = messages[messages.length - 1].id;
    }
    // If empty, do nothing — we'll ask again in 5 seconds
  }, 5000);  // 5000ms = 5 seconds
}

startPolling();

The code is dead simple — a setInterval, a fetch, and you're done. Six lines of logic. It works. And for a hackathon prototype with 5 users, it's perfectly fine. But the moment you think about real traffic, the math gets ugly fast.

The wasted-requests math: If you have 10,000 users polling every 5 seconds, that's 2,000 HTTP requests per second hitting your server. In a typical chat app, a user gets maybe 1 new message per minute. That means 59 out of 60 poll responses are empty — ~98% of your server's work produces zero useful output. Every one of those requests involves a TCP handshake, HTTP headers (~800 bytes), a database query, JSON serialization, and a response. All for nothing.

Section 4

Where Short Polling Breaks — The Math Gets Brutal

Short polling works fine on your laptop with 3 test users. But the moment you ship to production, three brutal problems emerge — and they all get worse as your user count grows.

See the dilemma? You can have fast updates or cheap infrastructure, but not both. The faster you want messages to appear, the harder you hammer the server. And the vast majority of those requests come back empty-handed. You're burning money on bandwidth and CPU for no reason.

Think First

What if instead of Bob repeatedly asking "got anything?", he asked once and the server didn't answer until there actually was something new? The server just holds the connection open, silently, doing nothing, waiting for data to arrive. How would that change the math?

This single idea — "don't respond until you have something to say" — is the foundation of long polling, and it eliminates almost all wasted requests.

That insight — "just don't respond until there's real data" — is the first major breakthrough. It transforms the entire waste equation and has powered real-time features since the early 2000s.

Section 5

The Breakthrough — Persistent Connections

Think First

Short polling wastes 98% of requests (they return "nothing new"). What if instead of the server immediately saying "nope," it just... doesn't respond? It holds your request open, waiting until there IS something new. What problems would this create? Think about timeouts, server memory, and what happens when 50,000 clients are all waiting simultaneously.

The fix for short polling's wastefulness is elegant: instead of the server immediately responding "nothing new" to every request, the server holds the request open and waits. It doesn't send a response at all until there's real data. The client sends one request, then sits there patiently with the connection open. When new data finally arrives — maybe 30 seconds later, maybe 2 milliseconds later — the server responds. The client processes it, immediately opens a new request, and the cycle repeats.

This was called long pollingA variation of polling where the server holds the client's request open until new data arrives (or a timeout occurs, typically 25-30 seconds). This eliminates empty responses — every response carries real data. Facebook used long polling for their original Messenger chat from 2008 until they migrated to MQTT around 2016., and it was the backbone of real-time web features for nearly a decade. Facebook's original chat (2008), early Google Talk, Slack's fallback mode — all long polling. But the real paradigm shift came with two technologies purpose-built for real-time: Server-Sent EventsA browser API (EventSource) that opens a persistent HTTP connection. The server pushes events down this connection as they happen — for minutes, hours, even days. One-way only (server to client), with auto-reconnect built in. Try it live: curl -N https://stream.wikimedia.org/v2/stream/recentchange to watch Wikipedia edits in real time. (SSE) and WebSocketsA protocol (RFC 6455) providing full two-way communication over a single TCP connection. Starts as an HTTP upgrade, then switches to a binary frame protocol with just 2-14 bytes overhead per message (vs ~800 bytes for HTTP). Slack, Discord, and multiplayer games all use WebSockets. Try it: wscat -c wss://ws.postman-echo.com/raw.

Both SSE and WebSockets represent the same core idea: the server can push data to you without you asking. No more polling. No more "are we there yet?" The connection stays open, and data flows the instant it exists.

The real evolution: Short polling → Long polling → SSE / WebSockets. Each step reduces overhead and latency. Long polling was a clever hack within HTTP's limitations. SSE and WebSockets were purpose-built for real-time from the ground up. Let's see each one in full detail.

Section 6

How Each Technique Works — The Full Picture

Now that you see the "why" — the journey from "just keep asking" to "keep the connection open" — let's look at each technique in full detail. For each one: the sequence diagram showing what happens on the wire, the real code, a real command you can run right now to see it in action, and when to use it (and when NOT to).

1. Short Polling — The Brute-Force Approach

We covered this in Section 3, so just the key summary. The client sends HTTP requests on a setInterval timer. The server responds immediately, even if there's nothing new. Simple, stateless, and wasteful.

When to use: Prototypes, admin dashboards that refresh every 30+ seconds, or situations where "good enough" latency is acceptable and simplicity beats efficiency. When NOT to use: anything user-facing with more than a few hundred users, or anything that needs sub-second latency.

2. Long Polling — The Patient Waiter

Long polling flips the script: the client sends a request, and the server holds it open until there's data to send. No timer on the client. No wasted empty responses. The server decides when to respond — either when real data arrives, or when a timeoutLong polling requests typically have a server-side timeout of 25-30 seconds. If no new data arrives within that window, the server responds with an empty result, and the client immediately opens a new long poll. This prevents connections from staying open forever and helps with load balancers/proxies that kill idle connections. (usually 25-30 seconds) expires.

long-polling.js

// Long polling: one request at a time, server holds until data exists
async function longPoll(lastMessageId = 0) {
  try {
    // This fetch might hang for 25+ seconds — that's the whole point!
    // The server won't respond until there's actual new data.
    const response = await fetch(
      `/api/messages?since=${lastMessageId}&wait=true`
    );
    const messages = await response.json();

    if (messages.length > 0) {
      messages.forEach(msg => displayMessage(msg));
      lastMessageId = messages[messages.length - 1].id;
    }
  } catch (error) {
    // Network error? Wait a bit before retrying
    await new Promise(resolve => setTimeout(resolve, 3000));
  }

  // Immediately start the next long poll — no timer needed
  longPoll(lastMessageId);
}

longPoll();

Long polling was the real-time backbone of the early social web. Facebook used it for their chat from 2008 until they migrated to MQTT around 2016. The CometDA popular open-source framework that implements long polling (and later WebSocket) for real-time web applications. Built on the Bayeux protocol. Many enterprise Java apps in the 2008-2015 era used CometD for push notifications and real-time features. library made it a standard pattern in enterprise Java apps. The trade-off? It still uses one HTTP request per data delivery — every response triggers a new request with full HTTP headers (~800 bytes of overhead each time).

When to use: When you need near-instant delivery but WebSocket support is uncertain — old corporate proxies, strict firewalls, or as a fallback (SignalR uses long polling as fallback). When NOT to use: high-frequency updates (more than ~1/sec) where HTTP header overhead adds up.

3. Server-Sent Events (SSE) — The Live Stream

What if instead of closing the connection after each response, the server just kept sending data on the same connection, over and over? That's exactly what SSE does. The client opens a single HTTP connection, and the server streams events down it for as long as it wants — minutes, hours, even days. Think of it like subscribing to a news ticker: you tune in once, and headlines keep appearing without you refreshing.

The best part? You can see this RIGHT NOW. Open a terminal and run:

Try this right now — Wikipedia live edit stream

# This streams EVERY edit happening on EVERY Wikipedia, in real time.
# You'll see edits from every language — English, Japanese, German, all of them.
# Press Ctrl+C to stop.
curl -N https://stream.wikimedia.org/v2/stream/recentchange

That command opens a single HTTP connection to Wikipedia's SSE endpoint. Every time someone edits any page on any Wikipedia anywhere in the world, you see it stream across your terminal. That's SSE in action — one connection, endless data. The -N flag tells curl not to buffer the output, so events appear the instant they arrive.

sse-client.js

// SSE: the browser handles reconnection, event parsing, everything.
// This is genuinely 3 lines of logic.
const source = new EventSource('/api/events');

// Listen for named event types
source.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  displayMessage(data);
});

source.addEventListener('price', (event) => {
  updateStockTicker(JSON.parse(event.data));
});

source.addEventListener('score', (event) => {
  updateScoreboard(JSON.parse(event.data));
});

// Connection drops? Browser auto-reconnects. You don't write that code.
// It even sends Last-Event-ID so the server knows where you left off.
source.onerror = () => {
  console.log('Connection lost — browser will auto-reconnect...');
};

What the HTTP response looks like on the wire

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

event: message
data: {"from": "Alice", "text": "Hey!"}

event: price
data: {"AAPL": 185.42, "GOOGL": 175.20}

event: score
data: {"match": "IND vs AUS", "score": "312/4"}

event: message
data: {"from": "Carol", "text": "What's up?"}

Each event is a plain-text block separated by a blank line. The event: line names the type, the data: line carries the payload. Incredibly simple — no binary protocol, no framing, just text over HTTP. NYSE uses this exact pattern for streaming ~5,000 price updates per second to broker terminals.

SSE's killer features are automatic reconnection (the browser handles it — you write zero reconnection code) and Last-Event-ID (on reconnect, the browser tells the server the last event it received, so the server can resume from that point). Perfect for stock tickers, live sports scores, notification feeds, CI/CD build logs — anything where data flows one direction: server to client.

When to use: Live dashboards, stock tickers, notification feeds, live sports scores, CI/CD build output — anything where the server pushes and the client listens. SSE is simpler than WebSockets, works over standard HTTP (proxy-friendly), and auto-reconnects. When NOT to use: anything where the client also needs to send data through the same channel (chat, gaming, collaborative editing).

4. WebSockets — The Full Duplex Channel

WebSockets are the gold standard for real-time web communication. Unlike SSE (server to client only), a WebSocket gives you a full two-way channel where both the client and server can send messages at any time, independently. It's like having a phone call — both sides talk whenever they want, no turns needed.

Want to see it? Install wscat (a WebSocket testing tool) and run:

Try this right now — WebSocket echo server

# Install wscat (needs Node.js)
npm install -g wscat

# Connect to Postman's public echo WebSocket server
wscat -c wss://ws.postman-echo.com/raw

# Now type anything — the server echoes it back instantly.
# Or open Discord in Chrome, DevTools, Network, WS tab
# to see real WebSocket binary frames flowing.

WebSockets start life as a normal HTTP request, then "upgrade" to a different protocol entirely. Once upgraded, HTTP overhead disappears — messages are tiny binary framesIn the WebSocket protocol, data travels in small units called frames. Each frame has just 2-14 bytes of overhead (compared to ~800 bytes for HTTP headers). A frame has: FIN bit (is this the last fragment?), opcode (text/binary/ping/pong/close), mask (client to server frames are masked), and payload length. This extreme efficiency is why Slack and Discord can handle millions of concurrent connections. with just 2-14 bytes of overhead per message (versus ~800 bytes for HTTP headers).

websocket-client.js

// WebSocket: full two-way communication
const ws = new WebSocket('wss://chat.example.com/ws');

ws.onopen = () => {
  console.log('Connected! Both sides can talk now.');
};

// Server sends us something — could be a message, typing indicator, anything
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === 'message') displayMessage(data);
  if (data.type === 'typing')  showTypingIndicator(data.user);
  if (data.type === 'online')  updateUserList(data.user);
};

// WE send something to the server — no HTTP request needed!
function sendMessage(text) {
  ws.send(JSON.stringify({ type: 'message', text }));
  // That's it. No fetch(). No headers. Just raw data on the wire.
  // Overhead: ~6 bytes. HTTP request overhead: ~800 bytes.
}

// Handle disconnection — YOU must write reconnection logic
ws.onclose = () => {
  console.log('Disconnected. Reconnecting in 3s...');
  setTimeout(() => connectWebSocket(), 3000);
};

Notice the key difference from SSE: ws.send() lets the client push data through the same connection at any time. No separate HTTP request. Slack uses this for messages, reactions, typing indicators, and presence — all over one WebSocket per user. Discord pushes text through WebSockets and voice through WebRTCWeb Real-Time Communication — a browser API for peer-to-peer audio/video streaming. Unlike WebSocket (which goes through a server), WebRTC sends media directly between users using UDP for low latency. Discord switched their voice infrastructure from Go to Rust for 10x memory improvement while handling 15M+ concurrent voice connections. (UDP for low-latency audio).

When to use: Chat (Slack, WhatsApp Web), multiplayer games, collaborative editing (Google Docs-style), live auctions, trading platforms — anywhere both client and server need to send messages at any time. When NOT to use: if you only need server to client push (SSE is simpler), or if you need maximum proxy/firewall compatibility (WebSocket upgrade can be blocked).

Think First

For each use case, which technique fits best? (1) A dashboard showing live server CPU metrics. (2) A multiplayer word game. (3) A sports score app. (4) A stock trading terminal.

Think about direction (one-way vs two-way), update frequency, and whether the client needs to send data back through the same channel.

Section 7

Going Deeper — WebSocket Internals & Scaling

WebSockets are the most powerful real-time technique, but they're also the trickiest to run at scale. Let's peel back the layers — the protocol handshake, the frame format, what happens when you have users on different servers, and why connections die silently (and how to prevent it).

The WebSocket Handshake & Frame Format

Every WebSocket connection starts as a normal HTTP request with special headers asking to "upgrade" the protocol. If the server agrees, it responds with 101 Switching Protocols, and from that moment on, the connection speaks WebSocket instead of HTTP. No more HTTP headers, no more request/response cycles — just raw binary frames.

The Sec-WebSocket-Key in the client request is a random base64 string. The server concatenates it with a magic GUID (258EAFA5-E914-47DA-95CA-C5AB0DC85B11), hashes it with SHA-1, and returns the base64 result as Sec-WebSocket-Accept. This proves the server actually understands WebSocket (not just echoing HTTP). Open Chrome DevTools, go to a site with WebSockets (Discord, Slack), click the Network tab, filter by "WS", and you'll see these upgrade headers in the first request.

Scaling WebSockets Across Multiple Servers (Redis Pub/Sub)

Here's the challenge nobody warns you about: WebSocket connections are statefulA stateful connection means the server remembers information about each connected client — who they are, what rooms they're in, what their last message was. This is the opposite of stateless HTTP, where each request is independent. Stateful connections make horizontal scaling much harder because you can't just add servers — you need a way to route messages between them.. Each user has a persistent TCP connection to a specific server. If Alice is connected to Server 1 and Bob is connected to Server 2, and Alice sends a message to Bob — how does Bob get it? Server 1 has the message, but Bob isn't on Server 1.

The standard solution: use a message brokerA middleware system (like Redis Pub/Sub, RabbitMQ, or Kafka) that sits between servers and routes messages. When Server 1 publishes a message, the broker ensures all other servers receive it so they can forward it to their connected clients. Think of it as a central post office that all servers subscribe to. like Redis to relay messages between servers. This is what Slack, Discord, and virtually every production chat system does.

Redis Pub/Sub commands (try these in redis-cli)

# Terminal 1 — Server 2 subscribes to chat channel
redis-cli SUBSCRIBE chat:room1
# Output: Reading messages... (type CTRL-C to quit)

# Terminal 2 — Server 1 publishes Alice's message
redis-cli PUBLISH chat:room1 '{"from":"Alice","text":"Hello!"}'
# Output: (integer) 1  ← number of subscribers who received it

# Terminal 1 immediately shows:
# 1) "message"
# 2) "chat:room1"
# 3) "{\"from\":\"Alice\",\"text\":\"Hello!\"}"

This pattern powers most production WebSocket systems. Slack's architecture: HAProxy load balancer, Go WebSocket servers, Redis Pub/Sub for cross-server messaging. Discord uses a similar approach but swapped Go for Rust on their voice gateway, achieving 10x memory improvement.

Connection Limits — File Descriptors, RAM, and the 1M Connection Challenge

Each WebSocket connection is a persistent TCP connection, which means it consumes real server resources: a file descriptorAn operating system handle for an open connection (or file). Each TCP connection uses one file descriptor. Linux defaults to ~1,024 per process (check with ulimit -n). Production servers raise this to 100K-1M. Running out of file descriptors means the server can't accept new connections — new users get "connection refused.", some memory for buffers, and a slot in the kernel's connection table.

Check and tune connection limits on Linux

# Check current file descriptor limit per process
ulimit -n
# Default: 1024 — way too low for production!

# Raise it for production servers
ulimit -n 1000000

# Check how many TCP connections you have right now
ss -s
# Output: TCP: 47 (estab 23, closed 3, orphaned 0, timewait 1)

# Check system-wide limits
cat /proc/sys/fs/file-max
# Output: 9223372036854775807 (on modern kernels — practically unlimited)

Here's the math for 1 million concurrent WebSocket connections:

The 1M connection math: Each WebSocket connection uses ~50KB of RAM (kernel buffers + application state). 1M connections x 50KB = 50GB RAM just for connection state. Plus 1M file descriptors. A single beefy server (128GB RAM) can handle this — but you're dedicating almost half its memory to just holding connections open, before any application logic. This is why Slack runs hundreds of WebSocket servers behind a load balancer, not one giant one.

WhatsApp famously achieved 2 million concurrent connections per server using Erlang (now Elixir), which has extremely lightweight processes (~2KB each). Discord handles 15M+ concurrent voice+text connections across their fleet. The key insight: the OS, language runtime, and per-connection memory all matter when you're at this scale.

Heartbeats & Reconnection — Why Connections Die Silently

WebSocket connections can go stale without either side knowing. A user closes their laptop lid. Their phone walks through a dead Wi-Fi zone. A NAT gatewayNetwork Address Translation — the router between your device and the internet. NAT gateways track active connections and silently drop idle ones after a timeout (typically 30-60 seconds for TCP). If your WebSocket doesn't send data for 60 seconds, the NAT gateway may forget about it, and the next message will fail silently. Heartbeats prevent this. between them silently drops idle connections after ~60 seconds. A corporate proxy kills long-lived connections after 5 minutes. Without heartbeats, the server thinks the user is still connected and keeps wasting resources on a dead socket.

When a client's connection drops and they need to reconnect, the naive approach (reconnect immediately, retry forever) can cause a thundering herdWhen a server restarts or a network blip resolves, thousands of clients try to reconnect at the exact same instant, overwhelming the server. The fix: exponential backoff with jitter — each client waits a random amount of time before reconnecting, spreading the load over seconds instead of milliseconds. if the server restarts — thousands of clients all reconnecting at once. The fix: exponential backoff with jitter.

reconnection-with-backoff.js

// Reconnect with exponential backoff + jitter
// Retry 1: wait ~1s. Retry 2: ~2s. Retry 3: ~4s. Retry 4: ~8s. Cap at 30s.
function connectWithBackoff(attempt = 0) {
  const ws = new WebSocket('wss://chat.example.com/ws');

  ws.onopen = () => {
    console.log('Connected!');
    attempt = 0;  // Reset backoff on successful connection
  };

  ws.onclose = () => {
    const baseDelay = Math.min(1000 * Math.pow(2, attempt), 30000); // 1s, 2s, 4s, 8s... max 30s
    const jitter = Math.random() * baseDelay * 0.5;  // Add randomness to prevent thundering herd
    const delay = baseDelay + jitter;

    console.log(`Reconnecting in ${Math.round(delay / 1000)}s (attempt ${attempt + 1})...`);
    setTimeout(() => connectWithBackoff(attempt + 1), delay);
  };

  return ws;
}

connectWithBackoff();

Production tips: Set heartbeat intervals to 30 seconds (keeps NAT gateways alive without wasting bandwidth). Use the WebSocket protocol's built-in PING/PONG frames (opcode 0x9 and 0xA) — they're designed for this exact purpose. Always implement exponential backoff with jitter on the client. And cap your retry delay at 30 seconds — waiting longer just frustrates users.

Section 8

Head-to-Head Comparison & Decision Framework

Think First

You're designing a system for live cricket scores. 10 million fans watching the same match. Data flows one direction: server pushes score updates to viewers. Viewers never send data back. Would you use WebSockets or SSE? Think about what happens when a viewer's connection drops — which technique handles reconnection automatically?

Now that you understand all four techniques inside and out, let's put them side by side. This comparison table is the single most useful reference when choosing the right approach for your system.

One more visual to make the decision instant. Walk through these three questions:

Think First

You're building an Uber-like ride tracking feature where the passenger watches the driver's car move on a map in real-time. The passenger doesn't need to send data back, just watch. Which technique would you pick and why?

Consider: data flows one direction (server to client), updates are frequent (every 1-2 seconds), and it needs to work reliably across mobile networks.

Section 9

Real-Time at Scale — How the Giants Do It

Think First

Slack has millions of users connected via WebSockets. Each WebSocket uses ~50KB of RAM. If a server has 16GB of RAM available for connections, how many concurrent WebSocket connections can one server hold? Now multiply by 10 million concurrent Slack users at peak. How many servers do you need just for the connection layer?

Theory is one thing. Running real-time systems for millions of users is another. Let's look at four companies that push billions of events per day and how they architect their connection layers.

Slack — 750K Organizations on WebSockets

Every Slack workspace you open holds an active WebSocket connection. Across 750,000+ paying organizations, that's tens of millions of concurrent connections at peak hours. A single server can't hold those — so Slack built a connection gateway layer.

The key insight: Slack separates holding connections from processing messages. Gateway servers do almost nothing except maintain sockets and forward bytes. That's why they can hold 100K+ connections per server — they're barely doing any CPU work. When Slack goes down, it's almost always the app layer, not the connection layer.

Discord — 15M Concurrent on WebSocket + WebRTC

Discord handles two very different real-time problems. Text chat and presence (who's online, typing indicators) run over WebSockets. Voice and video run over WebRTC — a peer-to-peer protocol that's lower latency than WebSocket because it uses UDP instead of TCP.

At 15 million concurrent voice+text users, Discord hit a wall with their Go gateway servers: garbage collection pauses caused latency spikes. Their fix? They rewrote the hot path in Rust, which has no garbage collector. The result: p99 latency dropped from 50ms to 10ms, and memory usage fell by 10x. Text messages still route through a gateway layer similar to Slack's, but voice packets bypass it entirely — they go peer-to-peer or through lightweight TURN relay servers.

Why two protocols? Text needs reliable delivery — every message must arrive, in order. TCP (WebSocket) guarantees that. Voice needs low latency — a dropped syllable is better than a delayed one. UDP (WebRTC) allows that. Using TCP for voice would cause stuttering whenever a packet is retransmitted.

Binance — 100K+ Traders on WebSocket Streams

Crypto markets never sleep, and traders need prices faster than their competitors. Binance pushes real-time price updates for 2,000+ trading pairs via WebSocket streams. Each stream is a dedicated WebSocket channel for one type of data (e.g., BTC/USDT 1-second candlesticks, ETH/USDT order book depth).

A single trader might subscribe to 20+ streams simultaneously. At 100,000+ concurrent traders, that's millions of active subscriptions. Binance solves this with stream multiplexing — you open one WebSocket connection and subscribe to multiple streams over it:

binance-multi-stream.js

// One connection, multiple streams — no connection-per-pair overhead
const ws = new WebSocket(
  'wss://stream.binance.com:9443/stream?streams=' +
  'btcusdt@trade/ethusdt@trade/bnbusdt@kline_1m'
);

ws.onmessage = (event) => {
  const { stream, data } = JSON.parse(event.data);
  // stream = "btcusdt@trade", data = { price: "67432.10", ... }
  updateTicker(stream, data);
};

This pattern — one connection carrying many logical channels — is how most high-throughput real-time systems work. It avoids the overhead of thousands of separate TCP handshakes and TLS negotiations.

Firebase Realtime Database — SSE Under the Hood

Firebase Realtime Database feels magical — you write ref.on('value', callback) and your UI updates instantly when data changes. Under the hood, it's using Server-Sent Events over a single long-lived HTTP connection. When you "listen" to a path in Firebase, the SDK opens an SSE stream to that path's URL. Every write from any client triggers a push down every listener's stream.

Firebase supports up to 200,000 concurrent connections per database. Beyond that, you shard — split your data across multiple database instances. Google's internal infrastructure fans out events through a Pub/Sub layer, so a single write to /messages/chat-room-42 can reach 10,000 listeners within 100ms, even if those listeners are spread across data centers in Iowa, Belgium, and Tokyo.

Why SSE and not WebSocket? Firebase's data model is read-heavy. Clients listen to paths and receive pushes. Writes go through the REST API or SDK (normal HTTP POST). Since the real-time channel only needs to push data server-to-client, SSE is simpler, more proxy-friendly, and auto-reconnects — no need for the bidirectional overhead of WebSocket.

Section 10

Anti-Lessons — Things That Sound Smart but Aren't

Real-time communication is full of "obvious" best practices that are actually wrong in many situations. These three misconceptions trip up engineers at every level.

"Use WebSocket for everything real-time"

This is the most common over-engineering mistake. If your data flows one direction (server to client) — notifications, live scores, stock tickers, build logs — then SSE does the job with half the complexity. SSE works over plain HTTP, passes through every proxy and CDN, auto-reconnects, and requires zero library code on the client (just new EventSource(url)).

WebSocket adds bidirectional framing, a custom protocol upgrade, manual reconnection logic, and incompatibility with some corporate proxies. Use it when you genuinely need the client to send data back on the same channel — chat, gaming, collaborative editing. For everything else, SSE is simpler and more reliable.

Rule of thumb: If only the server talks, use SSE. If both sides talk, use WebSocket. If you're unsure, start with SSE — you can always upgrade later.

"Polling is always bad"

Polling gets a bad reputation, and for high-frequency updates it deserves it. But for low-frequency data that changes rarely, polling is often the best choice. Weather data updates every 10-15 minutes. Exchange rates update every few seconds at most. Your user's subscription status changes once a month.

For these use cases, a simple setInterval(fetch, 300000) (every 5 minutes) is vastly simpler than maintaining a persistent connection. No connection state, no reconnection logic, no heartbeats, no load balancer stickiness. The server is stateless. Each request is independent. You can cache it at the CDN layer. You can scale horizontally with zero coordination.

The math: 10,000 users polling every 5 minutes = 33 requests/second. That's nothing for a modern server. Compare that to holding 10,000 WebSocket connections open 24/7, consuming RAM, file descriptors, and requiring sticky sessions.

"Real-time means instant"

"Real-time" doesn't mean zero latency — physics doesn't allow that. Light takes 67ms to travel from New York to London through a fiber optic cable. Add server processing, serialization, and network hops, and you're looking at 80-150ms minimum for cross-continent delivery.

The good news: humans perceive anything under ~100ms as "instant." For chat, 50-200ms is perfectly fine. For live video, 1-3 seconds of delay is standard. For stock trading, firms spend millions to shave off microseconds — but that's a different universe from web real-time. When someone says they need "real-time," ask: what's the actual latency budget? The answer determines your architecture.

Latency reference: Same data center: ~0.5ms. Same region: ~5-20ms. Cross-continent: ~80-150ms. Satellite internet (Starlink): ~25-60ms. Geostationary satellite (old HughesNet): ~600ms. If your users are spread globally, even WebSockets can't overcome the speed of light.

Section 11

Common Mistakes — Bugs That Ship to Production

These six mistakes appear in almost every team's first real-time feature. Each one seems minor during development but causes outages, memory leaks, or silent data loss at scale.

1. Not implementing reconnection logic

WebSocket connections will drop — Wi-Fi switches, mobile network handoffs, server deploys, load balancer timeouts. If your client doesn't reconnect, the user's UI silently goes stale. They see messages from 10 minutes ago and think they're current. This is worse than an error message because the user doesn't know something is wrong.

Fix: Always implement reconnection with exponential backoff + jitter (see Section 7). SSE handles this automatically — one more reason to prefer it when you can.

2. Ignoring NAT / proxy timeouts

NAT gateways and corporate proxies silently kill idle TCP connections after 30-60 seconds. If your WebSocket has no data flowing for a minute, the middlebox drops it without telling either side. The server thinks the client is connected; the client thinks it's connected. Next message? Lost into the void.

Fix: Send ping frames every 25-30 seconds. Both the WebSocket protocol and most libraries (Socket.IO, ws) support this natively. For SSE, send a comment line (: keepalive) at the same interval.

3. Sending full state instead of diffs

Suppose you have a dashboard with 200 metrics. Every second, 3-4 of them change. If you send all 200 metrics every time, you're wasting 98% of your bandwidth. At 10,000 concurrent viewers, that's the difference between 40 Mbps and 2 Gbps of outbound traffic.

Fix: Send only what changed (deltas/diffs). Track the last state sent to each client and compute the diff. Libraries like json-patch (RFC 6902) formalize this. For collaborative editing, look into CRDTs or Operational Transforms, which handle concurrent diffs gracefully.

4. No backpressure handling

If the server produces events faster than a client can consume them (slow network, overwhelmed browser), the send buffer grows unbounded. Each WebSocket connection has a kernel-level send buffer — if you keep writing to a slow client, you'll eventually exhaust server memory and crash the entire process, taking down all connections.

Fix: Monitor the send buffer size (Node.js: ws.bufferedAmount). If it exceeds a threshold (e.g., 1MB), drop that client's connection or skip messages. Better to miss one update than crash the server for everyone. Discord's Rust gateway uses per-client rate limiting for exactly this reason.

5. Using WebSocket when SSE suffices

This bears repeating because it's that common. A team builds a notification system with WebSocket, then spends weeks debugging proxy issues, reconnection bugs, and load balancer stickiness — problems that SSE doesn't have. WebSocket requires a protocol upgrade that some proxies block. SSE is just HTTP — every proxy, CDN, and load balancer handles it without special configuration.

Fix: Before reaching for WebSocket, ask: "Does the client need to send data back through this channel?" If no, use SSE. Save WebSocket for genuinely bidirectional use cases.

6. Not load-testing concurrent connections

Your app works great with 10 test connections. Then launch day hits with 50,000 users and the server runs out of file descriptors at 1,024 connections (the default Linux ulimit -n). Or memory fills up because each connection holds 200KB of state. Or the event loop blocks because you're serializing JSON for 50,000 clients on a single thread.

Fix: Load test before launch. Tools like websocat, artillery, or a simple script opening thousands of WebSocket connections will reveal your ceiling fast. Test on the same OS and hardware as production — macOS limits are different from Linux.

Section 12

Interview Playbook — What They Actually Ask

Real-time communication comes up in almost every system design interview — chat systems, notification services, live dashboards. Here's what's expected at each level.

What they expect you to know

Name the four real-time techniques (short polling, long polling, SSE, WebSocket) and explain the trade-offs in plain English. You don't need to design a full system — just show you understand when to use each one.

Sample question: "Your app needs to show live notifications. How would you implement it?"

Good answer: "Notifications are server-to-client only — the user doesn't send anything back through this channel. So I'd use Server-Sent Events. The browser opens one HTTP connection, the server pushes each notification as an event. SSE auto-reconnects if the connection drops, and it works through proxies without special config. If we needed the user to acknowledge or reply to notifications in the same channel, I'd upgrade to WebSocket — but for one-way pushes, SSE is simpler."

Junior tip: Interviewers don't expect you to know Redis Pub/Sub or horizontal scaling. They want to hear you reason about the trade-offs — specifically, why you wouldn't use WebSocket for everything.

What they expect you to know

Design a complete notification system end-to-end. Choose the right protocol, explain your persistence strategy, handle offline users, and discuss scaling to 100K+ concurrent connections.

Sample question: "Design a notification system for an e-commerce platform (order updates, price alerts, flash sale announcements)."

Good answer structure:

Protocol choice: SSE — notifications are server-to-client, no bidirectional need.
Persistence: Store notifications in a database (PostgreSQL) so offline users see them on login. Mark as read/unread.
Delivery path: Event producer (order service) publishes to a message queue (RabbitMQ/Kafka). Notification service consumes, checks user preferences (email? push? in-app?), and pushes via SSE to connected users.
Offline handling: On SSE reconnect, client sends Last-Event-ID. Server replays missed events from the database.
Scaling: SSE connections are stateless from the server's perspective (each is just a held-open HTTP response). Scale horizontally behind a load balancer. No sticky sessions needed.

What they expect you to design

Architect a full chat system at scale — WebSocket connection management, message routing across servers, presence (online/offline/typing), horizontal scaling, and failure handling.

Sample question: "Design Slack's messaging system for 1M concurrent users."

Architecture outline:

Key points to hit: Connection gateway layer (separate from app servers). Redis Pub/Sub for cross-server message fan-out. Connection registry mapping user IDs to gateway servers. Cassandra or ScyllaDB for message persistence (write-optimized, time-series partitioning). Presence tracking via TTL keys in Redis (user heartbeats every 30s, key expires in 60s = offline).

Section 13

Practice Exercises

Reading about real-time is one thing. Building it is where it clicks. These five exercises progress from "copy-paste and run" to "design and benchmark."

Exercise 1: SSE Client with EventSource API Easy

Open the Wikipedia recent-changes SSE stream in your browser and display edits in real time. This requires zero server code — Wikipedia provides the stream.

Hint

The stream URL is https://stream.wikimedia.org/v2/stream/recentchange. Use new EventSource(url). Each event's data field is JSON with title, user, timestamp, and server_url.

Solution

wikipedia-sse.html

<!-- Save this as a .html file and open in your browser -->
<h2>Live Wikipedia Edits</h2>
<div id="feed" style="font-family:monospace; font-size:13px;"></div>
<script>
const source = new EventSource(
  'https://stream.wikimedia.org/v2/stream/recentchange'
);
source.onmessage = (event) => {
  const d = JSON.parse(event.data);
  const el = document.createElement('div');
  el.textContent = `[${d.wiki}] ${d.user} edited "${d.title}"`;
  feed.prepend(el);
  if (feed.children.length > 50) feed.lastChild.remove();
};
</script>

Exercise 2: WebSocket Echo Server in Python Easy

Build a minimal WebSocket server that echoes back any message it receives. Test it with wscat. This teaches you the server side of the handshake.

Hint

Use the websockets library: pip install websockets. The server is about 8 lines. Run it, then connect with wscat -c ws://localhost:8765.

Solution

echo_server.py

import asyncio
import websockets

async def echo(websocket):
    async for message in websocket:
        print(f"Received: {message}")
        await websocket.send(f"Echo: {message}")

async def main():
    async with websockets.serve(echo, "localhost", 8765):
        print("Echo server running on ws://localhost:8765")
        await asyncio.Future()  # Run forever

asyncio.run(main())

Run with python echo_server.py, then in another terminal: wscat -c ws://localhost:8765. Type anything and see it echo back.

Exercise 3: Wikipedia Stream Consumer with Filtering Medium

Extend Exercise 1: filter the Wikipedia SSE stream to show only edits to English Wikipedia articles (not talk pages, not bots). Display the edit count per minute in real time.

Hint

Filter by d.wiki === "enwiki" and d.namespace === 0 (article namespace). Check d.bot === false to exclude bots. Use a counter that resets every 60 seconds.

Exercise 4: Redis Pub/Sub Chat Prototype Medium

Build a multi-room chat where two terminal windows can send messages to each other via Redis Pub/Sub. This is the exact pattern Slack and Discord use for cross-server message routing.

Hint

Run Redis locally (docker run -p 6379:6379 redis). Use redis-py or ioredis. One script subscribes to channel room:general, another publishes. In production, the WebSocket server subscribes on behalf of connected clients.

Exercise 5: Load Test — Find Your Connection Ceiling Hard

Spin up the echo server from Exercise 2 and open as many concurrent WebSocket connections as you can. Find where it breaks. Measure memory per connection.

Hint

Write a script that opens N connections in a loop. Check ulimit -n first (raise it if needed). Monitor with htop or ss -s. On a laptop you'll likely hit ~10K connections before something gives. On Linux with tuned limits, 100K+ is achievable.

Section 14

Cheat Sheet

Short Polling

setInterval(() => {
  fetch('/api/updates')
    .then(r => r.json())
    .then(update);
}, 5000);

Simple but wasteful. Use for: weather, low-freq data. ~98% empty responses.

Long Polling

function poll() {
  fetch('/api/wait')
    .then(r => r.json())
    .then(d => { update(d); poll(); });
}
poll();

Server holds request until data. Fallback when SSE/WS blocked.

SSE (Server-Sent Events)

const src = new EventSource('/stream');
src.onmessage = (e) => {
  update(JSON.parse(e.data));
};
// Auto-reconnects!

Server→client only. Auto-reconnect, Last-Event-ID. Proxy-friendly.

WebSocket

const ws = new WebSocket('wss://...');
ws.onmessage = (e) => update(e);
ws.send(JSON.stringify(msg));

Full duplex. Use for: chat, gaming, collaboration. Add heartbeats + reconnection.

Decision Rule

Low-freq? → Polling
Server→Client? → SSE
Bidirectional? → WebSocket
Audio/Video? → WebRTC

Start simple, upgrade only when needed. SSE > WS for most use cases.

Production Checklist

✓ Reconnection + backoff
✓ Heartbeats (30s interval)
✓ Send diffs, not full state
✓ Monitor bufferedAmount
✓ Load test connections
✓ ulimit -n > 100000

Every real-time system needs all six. Skip one and you'll learn why at 3am.

Section 15

Connected Topics — Where to Go Next

Real-time communication connects to almost every other system design topic. Here's where each thread leads.

Real-Time Communication

TL;DR — The Mental Model

The Scenario — Building a Chat App

First Attempt — Just Keep Asking (Short Polling)

Where Short Polling Breaks — The Math Gets Brutal

The Breakthrough — Persistent Connections

How Each Technique Works — The Full Picture

1. Short Polling — The Brute-Force Approach

2. Long Polling — The Patient Waiter

3. Server-Sent Events (SSE) — The Live Stream

4. WebSockets — The Full Duplex Channel

Going Deeper — WebSocket Internals & Scaling

Head-to-Head Comparison & Decision Framework

Real-Time at Scale — How the Giants Do It

Slack — 750K Organizations on WebSockets

Discord — 15M Concurrent on WebSocket + WebRTC

Binance — 100K+ Traders on WebSocket Streams

Firebase Realtime Database — SSE Under the Hood

Anti-Lessons — Things That Sound Smart but Aren't

Common Mistakes — Bugs That Ship to Production

Interview Playbook — What They Actually Ask

What they expect you to know

What they expect you to know

What they expect you to design

Practice Exercises

Cheat Sheet

Connected Topics — Where to Go Next

Load Balancing

Message Queues (Kafka / RabbitMQ)

Redis Pub/Sub

API Gateway & Rate Limiting

CDN & Edge Computing

TCP/IP & HTTP Deep Dive