TL;DR β The Mental Model
The normal web works on a simple deal: your browser asks a question, the server answers, and the conversation ends. You click a link, you get a page. You submit a form, you get a confirmation. That one-ask-one-answer cycle is called request-responseThe standard HTTP model β the client sends a request, the server sends back a response, and the connection closes. Like sending a letter and waiting for a reply. Every Google search and page load works this way., and it powers 99% of the web.
But some things can't wait for you to ask. When Alice sends a chat message, Bob needs to see it now β not the next time he refreshes. When a stock price changes, the trader's screen must update in milliseconds, not seconds. When your Uber driver turns a corner, the map should move instantly, not after you tap refresh. These are real-time problems, and solving them requires tricking HTTP into doing something it was never designed for β or replacing it entirely.
The Scenario β Building a Chat App
You're building a chat app. Alice sends a message. Bob needs to see it instantly. But HTTP is request-response β the server can only talk when the client asks a question. How would you get data from the server to Bob without Bob requesting it? Think of at least two different approaches before reading on.
Your team just got a new assignment: build a team chat application. Think a simpler version of Slack. The product requirement is clear β when Alice types a message, Bob should see it immediately. Not in 5 seconds. Not after refreshing the page. Instantly, the way a text message works on your phone. Messages must appear the moment they're sent.
You already know how the regular web works. The browser sends a request ("give me this page"), the server responds ("here you go"), and the connection closes. It's like ordering food at a counter β you ask, they hand it to you, transaction done. But chat is completely different. With chat, the server needs to push new messages to Bob the moment they arrive, even though Bob didn't ask for anything. The server needs to tap Bob on the shoulder and say "hey, Alice just said something."
This is the fundamental tension: HTTP was designed for the client to start every conversation. The server can never reach out to the browser unprompted. It's like a waiter who can only bring food when you order β they can't walk up and say "the chef just made something amazing, here you go." So how do we get around this?
You can only use normal HTTP (client asks, server answers). How would you build a chat app where messages appear instantly? What's the simplest trick you can think of?
Hint: What if the client just kept asking really, really often?That "keep asking" idea is exactly where our journey begins. It's the first and dumbest approach β and understanding why it works (and where it falls apart) naturally leads us to each better solution.
First Attempt β Just Keep Asking (Short Polling)
The simplest solution to "how does Bob get new messages?" is embarrassingly obvious: Bob's browser asks the server "any new messages?" over and over again, on a timer. Every 3 seconds, every 5 seconds β whatever interval you pick. If the server says "nope, nothing new," the browser waits and asks again. If the server says "yes, here's a new message from Alice," the browser shows it on screen.
This technique is called short pollingA technique where the client repeatedly sends HTTP requests at regular intervals (e.g., every 1-5 seconds) to check for new data. Simple to implement but wasteful β most requests return empty responses. Open Chrome DevTools β Network tab on any legacy chat app to see the repeated XHR requests., and it's the most basic form of "real-time." It works the same way a kid in the back seat keeps asking "are we there yet?" β annoying, wasteful, but eventually effective.
You can literally see this happening. Open Chrome DevTools on any app that uses short polling, click the Network tab, and watch the XHR requests fire every few seconds β most returning empty JSON arrays. It's enlightening and a little painful to watch.
See the problem? Alice sent her message at the 12-second mark, but Bob didn't find out until the 15-second mark β when his next poll fired. That's a 3-second delay. And 3 out of 4 requests came back completely empty. Those were wasted trips β wasted bandwidth, wasted CPU, wasted database queries that produced nothing.
// Short polling: ask for new messages every 5 seconds
// Open Chrome DevTools β Network tab to watch these fire
function startPolling() {
let lastMessageId = 0;
setInterval(async () => {
// Every 5 seconds, ask the server "anything new?"
const response = await fetch(`/api/messages?since=${lastMessageId}`);
const messages = await response.json();
if (messages.length > 0) {
// Got new messages! Show them on screen
messages.forEach(msg => displayMessage(msg));
lastMessageId = messages[messages.length - 1].id;
}
// If empty, do nothing β we'll ask again in 5 seconds
}, 5000); // 5000ms = 5 seconds
}
startPolling();
The code is dead simple β a setInterval, a fetch, and you're done. Six lines of logic. It works. And for a hackathon prototype with 5 users, it's perfectly fine. But the moment you think about real traffic, the math gets ugly fast.
Where Short Polling Breaks β The Math Gets Brutal
Short polling works fine on your laptop with 3 test users. But the moment you ship to production, three brutal problems emerge β and they all get worse as your user count grows.
See the dilemma? You can have fast updates or cheap infrastructure, but not both. The faster you want messages to appear, the harder you hammer the server. And the vast majority of those requests come back empty-handed. You're burning money on bandwidth and CPU for no reason.
What if instead of Bob repeatedly asking "got anything?", he asked once and the server didn't answer until there actually was something new? The server just holds the connection open, silently, doing nothing, waiting for data to arrive. How would that change the math?
This single idea β "don't respond until you have something to say" β is the foundation of long polling, and it eliminates almost all wasted requests.That insight β "just don't respond until there's real data" β is the first major breakthrough. It transforms the entire waste equation and has powered real-time features since the early 2000s.
The Breakthrough β Persistent Connections
Short polling wastes 98% of requests (they return "nothing new"). What if instead of the server immediately saying "nope," it just... doesn't respond? It holds your request open, waiting until there IS something new. What problems would this create? Think about timeouts, server memory, and what happens when 50,000 clients are all waiting simultaneously.
The fix for short polling's wastefulness is elegant: instead of the server immediately responding "nothing new" to every request, the server holds the request open and waits. It doesn't send a response at all until there's real data. The client sends one request, then sits there patiently with the connection open. When new data finally arrives β maybe 30 seconds later, maybe 2 milliseconds later β the server responds. The client processes it, immediately opens a new request, and the cycle repeats.
This was called long pollingA variation of polling where the server holds the client's request open until new data arrives (or a timeout occurs, typically 25-30 seconds). This eliminates empty responses β every response carries real data. Facebook used long polling for their original Messenger chat from 2008 until they migrated to MQTT around 2016., and it was the backbone of real-time web features for nearly a decade. Facebook's original chat (2008), early Google Talk, Slack's fallback mode β all long polling. But the real paradigm shift came with two technologies purpose-built for real-time: Server-Sent EventsA browser API (EventSource) that opens a persistent HTTP connection. The server pushes events down this connection as they happen β for minutes, hours, even days. One-way only (server to client), with auto-reconnect built in. Try it live: curl -N https://stream.wikimedia.org/v2/stream/recentchange to watch Wikipedia edits in real time. (SSE) and WebSocketsA protocol (RFC 6455) providing full two-way communication over a single TCP connection. Starts as an HTTP upgrade, then switches to a binary frame protocol with just 2-14 bytes overhead per message (vs ~800 bytes for HTTP). Slack, Discord, and multiplayer games all use WebSockets. Try it: wscat -c wss://ws.postman-echo.com/raw.
Both SSE and WebSockets represent the same core idea: the server can push data to you without you asking. No more polling. No more "are we there yet?" The connection stays open, and data flows the instant it exists.
How Each Technique Works β The Full Picture
Now that you see the "why" β the journey from "just keep asking" to "keep the connection open" β let's look at each technique in full detail. For each one: the sequence diagram showing what happens on the wire, the real code, a real command you can run right now to see it in action, and when to use it (and when NOT to).
1. Short Polling β The Brute-Force Approach
We covered this in Section 3, so just the key summary. The client sends HTTP requests on a setInterval timer. The server responds immediately, even if there's nothing new. Simple, stateless, and wasteful.
2. Long Polling β The Patient Waiter
Long polling flips the script: the client sends a request, and the server holds it open until there's data to send. No timer on the client. No wasted empty responses. The server decides when to respond β either when real data arrives, or when a timeoutLong polling requests typically have a server-side timeout of 25-30 seconds. If no new data arrives within that window, the server responds with an empty result, and the client immediately opens a new long poll. This prevents connections from staying open forever and helps with load balancers/proxies that kill idle connections. (usually 25-30 seconds) expires.
// Long polling: one request at a time, server holds until data exists
async function longPoll(lastMessageId = 0) {
try {
// This fetch might hang for 25+ seconds β that's the whole point!
// The server won't respond until there's actual new data.
const response = await fetch(
`/api/messages?since=${lastMessageId}&wait=true`
);
const messages = await response.json();
if (messages.length > 0) {
messages.forEach(msg => displayMessage(msg));
lastMessageId = messages[messages.length - 1].id;
}
} catch (error) {
// Network error? Wait a bit before retrying
await new Promise(resolve => setTimeout(resolve, 3000));
}
// Immediately start the next long poll β no timer needed
longPoll(lastMessageId);
}
longPoll();
Long polling was the real-time backbone of the early social web. Facebook used it for their chat from 2008 until they migrated to MQTT around 2016. The CometDA popular open-source framework that implements long polling (and later WebSocket) for real-time web applications. Built on the Bayeux protocol. Many enterprise Java apps in the 2008-2015 era used CometD for push notifications and real-time features. library made it a standard pattern in enterprise Java apps. The trade-off? It still uses one HTTP request per data delivery β every response triggers a new request with full HTTP headers (~800 bytes of overhead each time).
3. Server-Sent Events (SSE) β The Live Stream
What if instead of closing the connection after each response, the server just kept sending data on the same connection, over and over? That's exactly what SSE does. The client opens a single HTTP connection, and the server streams events down it for as long as it wants β minutes, hours, even days. Think of it like subscribing to a news ticker: you tune in once, and headlines keep appearing without you refreshing.
The best part? You can see this RIGHT NOW. Open a terminal and run:
# This streams EVERY edit happening on EVERY Wikipedia, in real time.
# You'll see edits from every language β English, Japanese, German, all of them.
# Press Ctrl+C to stop.
curl -N https://stream.wikimedia.org/v2/stream/recentchange
That command opens a single HTTP connection to Wikipedia's SSE endpoint. Every time someone edits any page on any Wikipedia anywhere in the world, you see it stream across your terminal. That's SSE in action β one connection, endless data. The -N flag tells curl not to buffer the output, so events appear the instant they arrive.
// SSE: the browser handles reconnection, event parsing, everything.
// This is genuinely 3 lines of logic.
const source = new EventSource('/api/events');
// Listen for named event types
source.addEventListener('message', (event) => {
const data = JSON.parse(event.data);
displayMessage(data);
});
source.addEventListener('price', (event) => {
updateStockTicker(JSON.parse(event.data));
});
source.addEventListener('score', (event) => {
updateScoreboard(JSON.parse(event.data));
});
// Connection drops? Browser auto-reconnects. You don't write that code.
// It even sends Last-Event-ID so the server knows where you left off.
source.onerror = () => {
console.log('Connection lost β browser will auto-reconnect...');
};
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
event: message
data: {"from": "Alice", "text": "Hey!"}
event: price
data: {"AAPL": 185.42, "GOOGL": 175.20}
event: score
data: {"match": "IND vs AUS", "score": "312/4"}
event: message
data: {"from": "Carol", "text": "What's up?"}
Each event is a plain-text block separated by a blank line. The event: line names the type, the data: line carries the payload. Incredibly simple β no binary protocol, no framing, just text over HTTP. NYSE uses this exact pattern for streaming ~5,000 price updates per second to broker terminals.
SSE's killer features are automatic reconnection (the browser handles it β you write zero reconnection code) and Last-Event-ID (on reconnect, the browser tells the server the last event it received, so the server can resume from that point). Perfect for stock tickers, live sports scores, notification feeds, CI/CD build logs β anything where data flows one direction: server to client.
4. WebSockets β The Full Duplex Channel
WebSockets are the gold standard for real-time web communication. Unlike SSE (server to client only), a WebSocket gives you a full two-way channel where both the client and server can send messages at any time, independently. It's like having a phone call β both sides talk whenever they want, no turns needed.
Want to see it? Install wscat (a WebSocket testing tool) and run:
# Install wscat (needs Node.js)
npm install -g wscat
# Connect to Postman's public echo WebSocket server
wscat -c wss://ws.postman-echo.com/raw
# Now type anything β the server echoes it back instantly.
# Or open Discord in Chrome, DevTools, Network, WS tab
# to see real WebSocket binary frames flowing.
WebSockets start life as a normal HTTP request, then "upgrade" to a different protocol entirely. Once upgraded, HTTP overhead disappears β messages are tiny binary framesIn the WebSocket protocol, data travels in small units called frames. Each frame has just 2-14 bytes of overhead (compared to ~800 bytes for HTTP headers). A frame has: FIN bit (is this the last fragment?), opcode (text/binary/ping/pong/close), mask (client to server frames are masked), and payload length. This extreme efficiency is why Slack and Discord can handle millions of concurrent connections. with just 2-14 bytes of overhead per message (versus ~800 bytes for HTTP headers).
// WebSocket: full two-way communication
const ws = new WebSocket('wss://chat.example.com/ws');
ws.onopen = () => {
console.log('Connected! Both sides can talk now.');
};
// Server sends us something β could be a message, typing indicator, anything
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'message') displayMessage(data);
if (data.type === 'typing') showTypingIndicator(data.user);
if (data.type === 'online') updateUserList(data.user);
};
// WE send something to the server β no HTTP request needed!
function sendMessage(text) {
ws.send(JSON.stringify({ type: 'message', text }));
// That's it. No fetch(). No headers. Just raw data on the wire.
// Overhead: ~6 bytes. HTTP request overhead: ~800 bytes.
}
// Handle disconnection β YOU must write reconnection logic
ws.onclose = () => {
console.log('Disconnected. Reconnecting in 3s...');
setTimeout(() => connectWebSocket(), 3000);
};
Notice the key difference from SSE: ws.send() lets the client push data through the same connection at any time. No separate HTTP request. Slack uses this for messages, reactions, typing indicators, and presence β all over one WebSocket per user. Discord pushes text through WebSockets and voice through WebRTCWeb Real-Time Communication β a browser API for peer-to-peer audio/video streaming. Unlike WebSocket (which goes through a server), WebRTC sends media directly between users using UDP for low latency. Discord switched their voice infrastructure from Go to Rust for 10x memory improvement while handling 15M+ concurrent voice connections. (UDP for low-latency audio).
For each use case, which technique fits best? (1) A dashboard showing live server CPU metrics. (2) A multiplayer word game. (3) A sports score app. (4) A stock trading terminal.
Think about direction (one-way vs two-way), update frequency, and whether the client needs to send data back through the same channel.Going Deeper β WebSocket Internals & Scaling
WebSockets are the most powerful real-time technique, but they're also the trickiest to run at scale. Let's peel back the layers β the protocol handshake, the frame format, what happens when you have users on different servers, and why connections die silently (and how to prevent it).
Every WebSocket connection starts as a normal HTTP request with special headers asking to "upgrade" the protocol. If the server agrees, it responds with 101 Switching Protocols, and from that moment on, the connection speaks WebSocket instead of HTTP. No more HTTP headers, no more request/response cycles β just raw binary frames.
The Sec-WebSocket-Key in the client request is a random base64 string. The server concatenates it with a magic GUID (258EAFA5-E914-47DA-95CA-C5AB0DC85B11), hashes it with SHA-1, and returns the base64 result as Sec-WebSocket-Accept. This proves the server actually understands WebSocket (not just echoing HTTP). Open Chrome DevTools, go to a site with WebSockets (Discord, Slack), click the Network tab, filter by "WS", and you'll see these upgrade headers in the first request.
Here's the challenge nobody warns you about: WebSocket connections are statefulA stateful connection means the server remembers information about each connected client β who they are, what rooms they're in, what their last message was. This is the opposite of stateless HTTP, where each request is independent. Stateful connections make horizontal scaling much harder because you can't just add servers β you need a way to route messages between them.. Each user has a persistent TCP connection to a specific server. If Alice is connected to Server 1 and Bob is connected to Server 2, and Alice sends a message to Bob β how does Bob get it? Server 1 has the message, but Bob isn't on Server 1.
The standard solution: use a message brokerA middleware system (like Redis Pub/Sub, RabbitMQ, or Kafka) that sits between servers and routes messages. When Server 1 publishes a message, the broker ensures all other servers receive it so they can forward it to their connected clients. Think of it as a central post office that all servers subscribe to. like Redis to relay messages between servers. This is what Slack, Discord, and virtually every production chat system does.
# Terminal 1 β Server 2 subscribes to chat channel
redis-cli SUBSCRIBE chat:room1
# Output: Reading messages... (type CTRL-C to quit)
# Terminal 2 β Server 1 publishes Alice's message
redis-cli PUBLISH chat:room1 '{"from":"Alice","text":"Hello!"}'
# Output: (integer) 1 β number of subscribers who received it
# Terminal 1 immediately shows:
# 1) "message"
# 2) "chat:room1"
# 3) "{\"from\":\"Alice\",\"text\":\"Hello!\"}"
This pattern powers most production WebSocket systems. Slack's architecture: HAProxy load balancer, Go WebSocket servers, Redis Pub/Sub for cross-server messaging. Discord uses a similar approach but swapped Go for Rust on their voice gateway, achieving 10x memory improvement.
Each WebSocket connection is a persistent TCP connection, which means it consumes real server resources: a file descriptorAn operating system handle for an open connection (or file). Each TCP connection uses one file descriptor. Linux defaults to ~1,024 per process (check with ulimit -n). Production servers raise this to 100K-1M. Running out of file descriptors means the server can't accept new connections β new users get "connection refused.", some memory for buffers, and a slot in the kernel's connection table.
# Check current file descriptor limit per process
ulimit -n
# Default: 1024 β way too low for production!
# Raise it for production servers
ulimit -n 1000000
# Check how many TCP connections you have right now
ss -s
# Output: TCP: 47 (estab 23, closed 3, orphaned 0, timewait 1)
# Check system-wide limits
cat /proc/sys/fs/file-max
# Output: 9223372036854775807 (on modern kernels β practically unlimited)
Here's the math for 1 million concurrent WebSocket connections:
WhatsApp famously achieved 2 million concurrent connections per server using Erlang (now Elixir), which has extremely lightweight processes (~2KB each). Discord handles 15M+ concurrent voice+text connections across their fleet. The key insight: the OS, language runtime, and per-connection memory all matter when you're at this scale.
WebSocket connections can go stale without either side knowing. A user closes their laptop lid. Their phone walks through a dead Wi-Fi zone. A NAT gatewayNetwork Address Translation β the router between your device and the internet. NAT gateways track active connections and silently drop idle ones after a timeout (typically 30-60 seconds for TCP). If your WebSocket doesn't send data for 60 seconds, the NAT gateway may forget about it, and the next message will fail silently. Heartbeats prevent this. between them silently drops idle connections after ~60 seconds. A corporate proxy kills long-lived connections after 5 minutes. Without heartbeats, the server thinks the user is still connected and keeps wasting resources on a dead socket.
When a client's connection drops and they need to reconnect, the naive approach (reconnect immediately, retry forever) can cause a thundering herdWhen a server restarts or a network blip resolves, thousands of clients try to reconnect at the exact same instant, overwhelming the server. The fix: exponential backoff with jitter β each client waits a random amount of time before reconnecting, spreading the load over seconds instead of milliseconds. if the server restarts β thousands of clients all reconnecting at once. The fix: exponential backoff with jitter.
// Reconnect with exponential backoff + jitter
// Retry 1: wait ~1s. Retry 2: ~2s. Retry 3: ~4s. Retry 4: ~8s. Cap at 30s.
function connectWithBackoff(attempt = 0) {
const ws = new WebSocket('wss://chat.example.com/ws');
ws.onopen = () => {
console.log('Connected!');
attempt = 0; // Reset backoff on successful connection
};
ws.onclose = () => {
const baseDelay = Math.min(1000 * Math.pow(2, attempt), 30000); // 1s, 2s, 4s, 8s... max 30s
const jitter = Math.random() * baseDelay * 0.5; // Add randomness to prevent thundering herd
const delay = baseDelay + jitter;
console.log(`Reconnecting in ${Math.round(delay / 1000)}s (attempt ${attempt + 1})...`);
setTimeout(() => connectWithBackoff(attempt + 1), delay);
};
return ws;
}
connectWithBackoff();
Head-to-Head Comparison & Decision Framework
You're designing a system for live cricket scores. 10 million fans watching the same match. Data flows one direction: server pushes score updates to viewers. Viewers never send data back. Would you use WebSockets or SSE? Think about what happens when a viewer's connection drops β which technique handles reconnection automatically?
Now that you understand all four techniques inside and out, let's put them side by side. This comparison table is the single most useful reference when choosing the right approach for your system.
One more visual to make the decision instant. Walk through these three questions:
You're building an Uber-like ride tracking feature where the passenger watches the driver's car move on a map in real-time. The passenger doesn't need to send data back, just watch. Which technique would you pick and why?
Consider: data flows one direction (server to client), updates are frequent (every 1-2 seconds), and it needs to work reliably across mobile networks.Real-Time at Scale β How the Giants Do It
Slack has millions of users connected via WebSockets. Each WebSocket uses ~50KB of RAM. If a server has 16GB of RAM available for connections, how many concurrent WebSocket connections can one server hold? Now multiply by 10 million concurrent Slack users at peak. How many servers do you need just for the connection layer?
Theory is one thing. Running real-time systems for millions of users is another. Let's look at four companies that push billions of events per day and how they architect their connection layers.
Slack β 750K Organizations on WebSockets
Every Slack workspace you open holds an active WebSocket connection. Across 750,000+ paying organizations, that's tens of millions of concurrent connections at peak hours. A single server can't hold those β so Slack built a connection gateway layer.
The key insight: Slack separates holding connections from processing messages. Gateway servers do almost nothing except maintain sockets and forward bytes. That's why they can hold 100K+ connections per server β they're barely doing any CPU work. When Slack goes down, it's almost always the app layer, not the connection layer.
Discord β 15M Concurrent on WebSocket + WebRTC
Discord handles two very different real-time problems. Text chat and presence (who's online, typing indicators) run over WebSockets. Voice and video run over WebRTC β a peer-to-peer protocol that's lower latency than WebSocket because it uses UDP instead of TCP.
At 15 million concurrent voice+text users, Discord hit a wall with their Go gateway servers: garbage collection pauses caused latency spikes. Their fix? They rewrote the hot path in Rust, which has no garbage collector. The result: p99 latency dropped from 50ms to 10ms, and memory usage fell by 10x. Text messages still route through a gateway layer similar to Slack's, but voice packets bypass it entirely β they go peer-to-peer or through lightweight TURN relay servers.
Binance β 100K+ Traders on WebSocket Streams
Crypto markets never sleep, and traders need prices faster than their competitors. Binance pushes real-time price updates for 2,000+ trading pairs via WebSocket streams. Each stream is a dedicated WebSocket channel for one type of data (e.g., BTC/USDT 1-second candlesticks, ETH/USDT order book depth).
A single trader might subscribe to 20+ streams simultaneously. At 100,000+ concurrent traders, that's millions of active subscriptions. Binance solves this with stream multiplexing β you open one WebSocket connection and subscribe to multiple streams over it:
// One connection, multiple streams β no connection-per-pair overhead
const ws = new WebSocket(
'wss://stream.binance.com:9443/stream?streams=' +
'btcusdt@trade/ethusdt@trade/bnbusdt@kline_1m'
);
ws.onmessage = (event) => {
const { stream, data } = JSON.parse(event.data);
// stream = "btcusdt@trade", data = { price: "67432.10", ... }
updateTicker(stream, data);
};
This pattern β one connection carrying many logical channels β is how most high-throughput real-time systems work. It avoids the overhead of thousands of separate TCP handshakes and TLS negotiations.
Firebase Realtime Database β SSE Under the Hood
Firebase Realtime Database feels magical β you write ref.on('value', callback) and your UI updates instantly when data changes. Under the hood, it's using Server-Sent Events over a single long-lived HTTP connection. When you "listen" to a path in Firebase, the SDK opens an SSE stream to that path's URL. Every write from any client triggers a push down every listener's stream.
Firebase supports up to 200,000 concurrent connections per database. Beyond that, you shard β split your data across multiple database instances. Google's internal infrastructure fans out events through a Pub/Sub layer, so a single write to /messages/chat-room-42 can reach 10,000 listeners within 100ms, even if those listeners are spread across data centers in Iowa, Belgium, and Tokyo.
Anti-Lessons β Things That Sound Smart but Aren't
Real-time communication is full of "obvious" best practices that are actually wrong in many situations. These three misconceptions trip up engineers at every level.
This is the most common over-engineering mistake. If your data flows one direction (server to client) β notifications, live scores, stock tickers, build logs β then SSE does the job with half the complexity. SSE works over plain HTTP, passes through every proxy and CDN, auto-reconnects, and requires zero library code on the client (just new EventSource(url)).
WebSocket adds bidirectional framing, a custom protocol upgrade, manual reconnection logic, and incompatibility with some corporate proxies. Use it when you genuinely need the client to send data back on the same channel β chat, gaming, collaborative editing. For everything else, SSE is simpler and more reliable.
Polling gets a bad reputation, and for high-frequency updates it deserves it. But for low-frequency data that changes rarely, polling is often the best choice. Weather data updates every 10-15 minutes. Exchange rates update every few seconds at most. Your user's subscription status changes once a month.
For these use cases, a simple setInterval(fetch, 300000) (every 5 minutes) is vastly simpler than maintaining a persistent connection. No connection state, no reconnection logic, no heartbeats, no load balancer stickiness. The server is stateless. Each request is independent. You can cache it at the CDN layer. You can scale horizontally with zero coordination.
The math: 10,000 users polling every 5 minutes = 33 requests/second. That's nothing for a modern server. Compare that to holding 10,000 WebSocket connections open 24/7, consuming RAM, file descriptors, and requiring sticky sessions.
"Real-time" doesn't mean zero latency β physics doesn't allow that. Light takes 67ms to travel from New York to London through a fiber optic cable. Add server processing, serialization, and network hops, and you're looking at 80-150ms minimum for cross-continent delivery.
The good news: humans perceive anything under ~100ms as "instant." For chat, 50-200ms is perfectly fine. For live video, 1-3 seconds of delay is standard. For stock trading, firms spend millions to shave off microseconds β but that's a different universe from web real-time. When someone says they need "real-time," ask: what's the actual latency budget? The answer determines your architecture.
Common Mistakes β Bugs That Ship to Production
These six mistakes appear in almost every team's first real-time feature. Each one seems minor during development but causes outages, memory leaks, or silent data loss at scale.
WebSocket connections will drop β Wi-Fi switches, mobile network handoffs, server deploys, load balancer timeouts. If your client doesn't reconnect, the user's UI silently goes stale. They see messages from 10 minutes ago and think they're current. This is worse than an error message because the user doesn't know something is wrong.
Fix: Always implement reconnection with exponential backoff + jitter (see Section 7). SSE handles this automatically β one more reason to prefer it when you can.
NAT gateways and corporate proxies silently kill idle TCP connections after 30-60 seconds. If your WebSocket has no data flowing for a minute, the middlebox drops it without telling either side. The server thinks the client is connected; the client thinks it's connected. Next message? Lost into the void.
Fix: Send ping frames every 25-30 seconds. Both the WebSocket protocol and most libraries (Socket.IO, ws) support this natively. For SSE, send a comment line (: keepalive) at the same interval.
Suppose you have a dashboard with 200 metrics. Every second, 3-4 of them change. If you send all 200 metrics every time, you're wasting 98% of your bandwidth. At 10,000 concurrent viewers, that's the difference between 40 Mbps and 2 Gbps of outbound traffic.
Fix: Send only what changed (deltas/diffs). Track the last state sent to each client and compute the diff. Libraries like json-patch (RFC 6902) formalize this. For collaborative editing, look into CRDTs or Operational Transforms, which handle concurrent diffs gracefully.
If the server produces events faster than a client can consume them (slow network, overwhelmed browser), the send buffer grows unbounded. Each WebSocket connection has a kernel-level send buffer β if you keep writing to a slow client, you'll eventually exhaust server memory and crash the entire process, taking down all connections.
Fix: Monitor the send buffer size (Node.js: ws.bufferedAmount). If it exceeds a threshold (e.g., 1MB), drop that client's connection or skip messages. Better to miss one update than crash the server for everyone. Discord's Rust gateway uses per-client rate limiting for exactly this reason.
This bears repeating because it's that common. A team builds a notification system with WebSocket, then spends weeks debugging proxy issues, reconnection bugs, and load balancer stickiness β problems that SSE doesn't have. WebSocket requires a protocol upgrade that some proxies block. SSE is just HTTP β every proxy, CDN, and load balancer handles it without special configuration.
Fix: Before reaching for WebSocket, ask: "Does the client need to send data back through this channel?" If no, use SSE. Save WebSocket for genuinely bidirectional use cases.
Your app works great with 10 test connections. Then launch day hits with 50,000 users and the server runs out of file descriptors at 1,024 connections (the default Linux ulimit -n). Or memory fills up because each connection holds 200KB of state. Or the event loop blocks because you're serializing JSON for 50,000 clients on a single thread.
Fix: Load test before launch. Tools like websocat, artillery, or a simple script opening thousands of WebSocket connections will reveal your ceiling fast. Test on the same OS and hardware as production β macOS limits are different from Linux.
Interview Playbook β What They Actually Ask
Real-time communication comes up in almost every system design interview β chat systems, notification services, live dashboards. Here's what's expected at each level.
What they expect you to know
Name the four real-time techniques (short polling, long polling, SSE, WebSocket) and explain the trade-offs in plain English. You don't need to design a full system β just show you understand when to use each one.
Sample question: "Your app needs to show live notifications. How would you implement it?"
Good answer: "Notifications are server-to-client only β the user doesn't send anything back through this channel. So I'd use Server-Sent Events. The browser opens one HTTP connection, the server pushes each notification as an event. SSE auto-reconnects if the connection drops, and it works through proxies without special config. If we needed the user to acknowledge or reply to notifications in the same channel, I'd upgrade to WebSocket β but for one-way pushes, SSE is simpler."
What they expect you to know
Design a complete notification system end-to-end. Choose the right protocol, explain your persistence strategy, handle offline users, and discuss scaling to 100K+ concurrent connections.
Sample question: "Design a notification system for an e-commerce platform (order updates, price alerts, flash sale announcements)."
Good answer structure:
- Protocol choice: SSE β notifications are server-to-client, no bidirectional need.
- Persistence: Store notifications in a database (PostgreSQL) so offline users see them on login. Mark as read/unread.
- Delivery path: Event producer (order service) publishes to a message queue (RabbitMQ/Kafka). Notification service consumes, checks user preferences (email? push? in-app?), and pushes via SSE to connected users.
- Offline handling: On SSE reconnect, client sends
Last-Event-ID. Server replays missed events from the database. - Scaling: SSE connections are stateless from the server's perspective (each is just a held-open HTTP response). Scale horizontally behind a load balancer. No sticky sessions needed.
What they expect you to design
Architect a full chat system at scale β WebSocket connection management, message routing across servers, presence (online/offline/typing), horizontal scaling, and failure handling.
Sample question: "Design Slack's messaging system for 1M concurrent users."
Architecture outline:
Key points to hit: Connection gateway layer (separate from app servers). Redis Pub/Sub for cross-server message fan-out. Connection registry mapping user IDs to gateway servers. Cassandra or ScyllaDB for message persistence (write-optimized, time-series partitioning). Presence tracking via TTL keys in Redis (user heartbeats every 30s, key expires in 60s = offline).
Practice Exercises
Reading about real-time is one thing. Building it is where it clicks. These five exercises progress from "copy-paste and run" to "design and benchmark."
Open the Wikipedia recent-changes SSE stream in your browser and display edits in real time. This requires zero server code β Wikipedia provides the stream.
The stream URL is https://stream.wikimedia.org/v2/stream/recentchange. Use new EventSource(url). Each event's data field is JSON with title, user, timestamp, and server_url.
<!-- Save this as a .html file and open in your browser -->
<h2>Live Wikipedia Edits</h2>
<div id="feed" style="font-family:monospace; font-size:13px;"></div>
<script>
const source = new EventSource(
'https://stream.wikimedia.org/v2/stream/recentchange'
);
source.onmessage = (event) => {
const d = JSON.parse(event.data);
const el = document.createElement('div');
el.textContent = `[${d.wiki}] ${d.user} edited "${d.title}"`;
feed.prepend(el);
if (feed.children.length > 50) feed.lastChild.remove();
};
</script>
Build a minimal WebSocket server that echoes back any message it receives. Test it with wscat. This teaches you the server side of the handshake.
Use the websockets library: pip install websockets. The server is about 8 lines. Run it, then connect with wscat -c ws://localhost:8765.
import asyncio
import websockets
async def echo(websocket):
async for message in websocket:
print(f"Received: {message}")
await websocket.send(f"Echo: {message}")
async def main():
async with websockets.serve(echo, "localhost", 8765):
print("Echo server running on ws://localhost:8765")
await asyncio.Future() # Run forever
asyncio.run(main())
Run with python echo_server.py, then in another terminal: wscat -c ws://localhost:8765. Type anything and see it echo back.
Extend Exercise 1: filter the Wikipedia SSE stream to show only edits to English Wikipedia articles (not talk pages, not bots). Display the edit count per minute in real time.
Filter by d.wiki === "enwiki" and d.namespace === 0 (article namespace). Check d.bot === false to exclude bots. Use a counter that resets every 60 seconds.
Build a multi-room chat where two terminal windows can send messages to each other via Redis Pub/Sub. This is the exact pattern Slack and Discord use for cross-server message routing.
Run Redis locally (docker run -p 6379:6379 redis). Use redis-py or ioredis. One script subscribes to channel room:general, another publishes. In production, the WebSocket server subscribes on behalf of connected clients.
Spin up the echo server from Exercise 2 and open as many concurrent WebSocket connections as you can. Find where it breaks. Measure memory per connection.
Write a script that opens N connections in a loop. Check ulimit -n first (raise it if needed). Monitor with htop or ss -s. On a laptop you'll likely hit ~10K connections before something gives. On Linux with tuned limits, 100K+ is achievable.
Cheat Sheet
setInterval(() => {
fetch('/api/updates')
.then(r => r.json())
.then(update);
}, 5000);
Simple but wasteful. Use for: weather, low-freq data. ~98% empty responses.
function poll() {
fetch('/api/wait')
.then(r => r.json())
.then(d => { update(d); poll(); });
}
poll();
Server holds request until data. Fallback when SSE/WS blocked.
const src = new EventSource('/stream');
src.onmessage = (e) => {
update(JSON.parse(e.data));
};
// Auto-reconnects!
Serverβclient only. Auto-reconnect, Last-Event-ID. Proxy-friendly.
const ws = new WebSocket('wss://...');
ws.onmessage = (e) => update(e);
ws.send(JSON.stringify(msg));
Full duplex. Use for: chat, gaming, collaboration. Add heartbeats + reconnection.
Low-freq? β Polling ServerβClient? β SSE Bidirectional? β WebSocket Audio/Video? β WebRTC
Start simple, upgrade only when needed. SSE > WS for most use cases.
β Reconnection + backoff β Heartbeats (30s interval) β Send diffs, not full state β Monitor bufferedAmount β Load test connections β ulimit -n > 100000
Every real-time system needs all six. Skip one and you'll learn why at 3am.
Connected Topics β Where to Go Next
Real-time communication connects to almost every other system design topic. Here's where each thread leads.