TL;DR — The One-Minute Version
Right now, open a terminal and type curl -v https://google.com 2>&1 | grep -i server. You'll see server: gws in the response headers. GWS stands for Google Web Server. But here's the thing — you never connected to GWS directly. Your request first hit a Google Front End (GFE)Google's custom reverse proxy that sits in front of all their services. It terminates TLS, authenticates requests, applies rate limits, and routes to the right backend. Every Google request — Search, Gmail, YouTube — goes through GFE first., a reverse proxy that terminated your TLS connection, checked your request, applied rate limits, and then forwarded it to GWS on a private internal network. You talked to a proxy. The proxy talked to the real server. You never knew the difference.
Try another one: curl -I https://cloudflare.com. Look at the response header cf-ray: 8a1b2c3d4e5f-BOM. That three-letter code at the end — BOM — is Cloudflare's airport code for Mumbai. It means your request hit a Cloudflare reverse proxy physically located in Mumbai. The actual origin server could be in San Francisco, Frankfurt, or anywhere. The proxy served you from the closest location.
These aren't obscure concepts. Nginx powers 34% of all websites (Netcraft 2024) — Apple, Netflix, Airbnb all run it. HAProxy handles load balancing for GitHub, Stack Overflow, and Reddit, supporting 2 million+ concurrent connections. Envoy, created by Lyft in 2016, is the default sidecar proxy in Istio service meshes and handles 10M+ requests per second at Google's scale. Every time you load a webpage, stream a video, or call an API, a proxy touched your request.
The Scenario — Your Server Is Drowning
You've built an API for a recipe-sharing app. It's running on a single DigitalOcean droplet: 4 vCPUs, 8GB RAM, Ubuntu 22.04, public IP 203.0.113.42. You deployed with docker compose up -d, pointed your domain at the IP, and installed a Let's Encrypt certificate with certbot. Everything works. Your API responds in 45ms. Life is good.
Then a food blogger with 2 million followers posts your app. Traffic goes from 50 requests per second to 10,000. And here's what happens — not in theory, but on your actual server:
Here's what your terminal would show if you SSH'd into the box at 4 PM:
# Check CPU usage — TLS handshakes eating everything
$ top -bn1 | head -5
%Cpu(s): 94.2 us, 3.1 sy, 0.0 ni, 1.8 id, 0.0 wa
MiB Mem: 7962.4 total, 312.8 free, 7128.6 used, 521.0 buff
# ^ Only 312MB free. OOM killer coming soon.
# Check open connections — each client holds a TCP connection
$ ss -s
TCP: 8247 (estab 7891, closed 112, orphaned 34, timewait 210)
# ^ 8,247 open TCP connections. Each one = memory + file descriptor.
# Check who's hitting you — bots found your IP
$ journalctl -u myapp --since "1 hour ago" | grep -c "bot"
14,293
# ^ 14K bot requests in the last hour. Your IP is public, remember.
Your server is doing everything: running your application, encrypting every connection with TLS (each TLS handshakeThe initial negotiation when a client connects over HTTPS. It takes 2 round trips and involves certificate exchange, key generation, and cipher negotiation. At 10K clients, that's 10K handshakes — each consuming CPU cycles for cryptographic operations. costs CPU), serving static images, fighting off 14,000 bot requests, and maintaining 8,000 open connections. That's like asking a restaurant chef to also be the bouncer, the waiter, the cashier, and the dishwasher — during Friday dinner rush.
First Attempt — Direct Client-to-Server
Let's look at what you actually set up. The simplest possible architecture: your domain's DNS A record points directly to your server's IP address. Every client connects straight to that IP. No middleman. You can see this yourself — run dig +short yourapp.com and you'll get back 203.0.113.42. That's your one server. Everyone in the world now knows its address.
This setup works fine when you have 50 users. But it has fundamental problems that a firewall alone can't fix:
- TLS is on the same box as your app. Every new connection triggers a cryptographic handshake that burns CPU. At 3,000 connections per second, TLS alone consumes 40% of your processing power. Your app gets the leftovers.
- No caching. A thousand users requesting the same recipe page? Your server runs the same database query a thousand times. A caching layer in front would serve one response to all of them.
- Your real IP is public. Anyone can run
dig yourapp.com, get your IP, and aim a DDoS attackDistributed Denial of Service — thousands of compromised machines (a botnet) flood your server with junk traffic simultaneously. Your server spends all its resources handling fake requests and can't serve real users. In October 2016, the Mirai botnet DDoS'd Dyn DNS and took down Twitter, Netflix, Reddit, and GitHub. straight at it. A firewall can block known bad IPs, but it can't absorb 50 Gbps of traffic — your server's network link is only 1 Gbps. - Scaling is impossible. You add a second server — great. But
dig yourapp.comstill returns one IP. How do clients know about server #2? DNS round-robin is fragile and slow to propagate. You need something smarter.
Some developers say: "My cloud provider gives me a firewall and auto-scaling. Isn't that enough?" Not for this. A firewallA network device or software that filters traffic based on IP addresses, ports, and protocols. It works at Layer 3/4 (network/transport). It can block IP ranges and close ports, but it can't understand HTTP content, cache responses, or distribute traffic across servers. operates at Layer 3/4 — it blocks IP ranges and closes ports. It doesn't understand HTTP, can't cache responses, can't terminate TLS for you, and can't intelligently route requests to multiple backends. Auto-scaling adds new servers, but without a proxy in front, there's no way to route traffic to them. You need both. The proxy is the traffic cop; the firewall is the locked gate.
Where It Breaks — Four Fatal Flaws
The direct-exposure setup has four fundamental problems. Each one is a ticking time bomb, and they all go off at the same time — when your app starts getting real traffic. Let's put real numbers on each failure mode.
1. TLS Is Crushing Your CPU
Every HTTPS connection starts with a TLS handshakeA multi-step cryptographic negotiation at the start of every HTTPS connection. Client and server exchange certificates, agree on a cipher suite, and derive session keys. This involves asymmetric cryptography (RSA or ECDHE), which is computationally expensive — roughly 2ms of CPU time per handshake. — two round trips of cryptographic negotiation. On your 4-core server, each handshake costs about 2ms of CPU time. Sounds tiny, but do the math:
- At 1,000 new connections/sec: 2ms x 1,000 = 2 seconds of CPU time per second. Half a core. Manageable.
- At 5,000 new connections/sec: 2ms x 5,000 = 10 seconds of CPU per second. You only have 4 seconds of CPU per second (4 cores). TLS alone needs 2.5x your entire server.
- At 10,000 new connections/sec: 20 seconds of CPU per second. Physically impossible on 4 cores.
A proxy solves this completely. TLS termination happens at the proxy — with session resumptionA TLS optimization where returning clients skip the expensive full handshake and reuse a previously established session. The proxy stores session tickets, cutting handshakes from 2 round trips to 1 and saving 50%+ of the CPU cost. Nginx enables this with ssl_session_cache shared:SSL:10m. enabled, returning clients skip the full handshake entirely. The backend gets plain HTTP on a private network — zero crypto work.
2. No Caching = Wasted Work
Your recipe app's homepage is the same for everyone — top 20 recipes, a search bar, some images. Without a cache, every visitor triggers the same database query, the same template rendering, the same JSON serialization. A thousand users requesting /api/recipes/popular in one second means your database runs the exact same query a thousand times.
With a caching proxy, the first request hits your backend (a cache missWhen the proxy doesn't have a cached copy of the requested content. It fetches from the backend, stores the response, and serves it. All subsequent identical requests become cache hits — served from proxy memory in under 1ms, without touching the backend at all.), the proxy stores the response, and the next 999 are served from proxy memory in under 1ms. Your backend processes one request instead of a thousand — a 99.9% reduction in load for cacheable content.
Check it yourself: curl -I https://yourapp.com/api/recipes/popular and look at the Cache-Control header. If it says no-store or there's no header at all, every identical response is being regenerated from scratch. That's massive wasted compute.
3. Your IP Is a Target
Run dig +short yourapp.com. That returns 203.0.113.42. Now anyone — including attackers — knows exactly where your server lives. Unlike a site behind Cloudflare (where dig returns Cloudflare's IP, not yours), nothing sits between the attacker and your machine.
A DDoS attack at even 10 Gbps saturates your server's 1 Gbps network link. Your cloud firewall can't help — the traffic fills the pipe before rules apply. You need something with massive network capacity (Cloudflare has 248+ Tbps) to absorb the flood before it reaches you. That something is a reverse proxy network.
Even without DDoS, bots find your IP within hours. Shodan.io continuously scans all 4 billion IPv4 addresses. Run shodan host 203.0.113.42 and you'll see every open port. If SSH (22), HTTPS (443), and PostgreSQL (5432) are all visible, you're exposed on three fronts.
4. Scaling Is Impossible
You add a second server at 203.0.113.43. But dig yourapp.com still returns only .42. You could add a second DNS A record (DNS round-robin), but DNS has no concept of server health. If .42 dies, half your users still get routed there because their DNS resolver cached it for hours. DNS doesn't know if a server is overloaded, down, or on fire.
You need something at a single IP that accepts all traffic and distributes it to healthy backends. Something that checks each server's health every few seconds with GET /health and automatically removes dead servers from rotation. That's exactly what Nginx's upstream block does with the check directive — and what HAProxy does with server web1 10.0.1.1:8080 check inter 3s fall 3 rise 2.
The Breakthrough — Put a Shield in Front
The fix is one of the most important patterns in all of system design: stop exposing your server directly. Put a machine in front of it — a reverse proxyA server that sits between clients and your backend servers. Clients send requests to the proxy's IP address (they don't even know your real servers exist). The proxy handles TLS, caching, rate limiting, and load balancing, then forwards clean HTTP requests to your backends on a private network. — that handles all the messy, dangerous, repetitive work. Your DNS now points to the proxy's IP. Clients talk to the proxy. The proxy talks to your servers. Your servers' real IPs are hidden from the world.
In practice, this is as simple as installing Nginx on a separate machine and adding a few lines of configuration:
upstream backend {
server 10.0.1.1:8080; # App server 1 (private IP — unreachable from internet)
server 10.0.1.2:8080; # App server 2 (add more anytime)
}
server {
listen 443 ssl;
server_name yourapp.com;
ssl_certificate /etc/letsencrypt/live/yourapp.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourapp.com/privkey.pem;
location / {
proxy_pass http://backend; # Forward to backend over plain HTTP
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # Pass real client IP
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
}
}
That's it. proxy_pass http://backend; — that single directive is the heart of a reverse proxy. The upstream block lists your backend servers by private IPs (10.0.1.x — unreachable from the internet). Nginx handles TLS, accepts client connections, and forwards clean HTTP to your backends. To add a third server, add one line: server 10.0.1.3:8080; and run nginx -s reload. Zero downtime. Clients don't notice a thing.
- TLS: Backend CPU 94% → 12% (proxy handles all crypto, backends do zero TLS)
- Caching: 10,000 DB queries/sec → 10/sec (99.9% served from proxy cache)
- Security: Real IP public → Real IP hidden, bots rate-limited at proxy
- Scaling: 1 server, can't add more → Add servers with one config line, zero downtime
- Connections: 10K clients → 10K backend connections → Now: 10K clients → 50 pooled connections
How It Works — Four Types of Proxies
Not all proxies are the same. They differ in who they protect, where they sit in the network, and whether anyone knows they're there. Let's break down the four main types — each with real software you can install and real commands you can run.
1. Forward Proxy — Protects the Client
A forward proxy sits in front of clients. It takes a client's request and sends it to the internet on the client's behalf. The server on the other end never sees the real client — it only sees the proxy's IP address. Think of it like sending mail through a P.O. Box: the recipient sees the box number, not your home address.
Where you've used this: Your company's corporate network almost certainly has one. When you access the internet at work, traffic goes through a proxy like SquidAn open-source forward proxy and caching server used by thousands of ISPs and corporations worldwide. It can cache web content to save bandwidth, filter URLs to block unsafe sites, and log all traffic for compliance. First released in 1996, it's still widely deployed today. or Zscaler. IT can see which domains you visit (but not HTTPS content), block social media, and scan for malware. Try it: curl -x proxy.company.com:8080 https://google.com — the -x flag tells curl to route through a proxy.
Developers use forward proxies too: Charles Proxy and mitmproxy sit between your browser and the internet, letting you inspect every HTTP request and response. Mobile developers use them constantly to debug API calls from iOS and Android apps.
2. Reverse Proxy — Protects the Server
A reverse proxy sits in front of servers. The client has no idea the proxy exists — it thinks it's talking directly to the real server. This is the proxy type you'll encounter in 90% of system design discussions. When someone says "proxy" without qualification, they almost always mean a reverse proxy.
Real software: Nginx, HAProxy, Envoy, Traefik, Caddy. Every major website runs a reverse proxy. Check for yourself: curl -I https://github.com and look at the response headers — you'll see their load balancing infrastructure, not the application server.
Here's a production Nginx reverse proxy config — the same pattern used by millions of sites:
upstream backend {
server 10.0.1.1:8080 weight=5; # Stronger server gets 5x traffic
server 10.0.1.2:8080 weight=2; # Medium server
server 10.0.1.3:8080 weight=1; # Smallest server
keepalive 50; # Pool: 50 persistent backend connections
}
server {
listen 443 ssl http2;
server_name api.yourapp.com;
# TLS termination — proxy handles ALL crypto
ssl_certificate /etc/letsencrypt/live/api.yourapp.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.yourapp.com/privkey.pem;
ssl_session_cache shared:SSL:10m; # Returning clients skip full handshake
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
3. Transparent Proxy — Nobody Knows It's There
A transparent proxyA proxy that intercepts traffic without any client configuration. The client doesn't set up a proxy, the server doesn't know one exists. It's deployed at the network level — often by ISPs or corporate IT — and silently processes all traffic passing through it. intercepts traffic without anyone configuring it. You don't set a proxy in your browser. The server doesn't know it's there. It sits in the network path and silently processes everything that passes through.
Where you've experienced this: Ever connected to hotel Wi-Fi and been redirected to a login page? That's a transparent proxy intercepting your HTTP request and injecting a redirect. Your ISP probably runs one too — it caches popular content (why re-download a Windows update for every customer?) and may filter content for compliance.
You can detect transparent proxies. Run traceroute google.com and look for unexpected hops with high latency or ISP names. Some proxies add Via: or X-Forwarded-For: headers that reveal their presence. Note: transparent proxies struggle with HTTPS because they can't see the encrypted content — this is why your ISP can see which domains you visit but not the page content.
4. Sidecar Proxy — One Per Service
A sidecar proxy runs alongside your application — in the same Kubernetes pod or the same VM. Instead of one central proxy for everything, every service gets its own tiny proxy. All traffic in and out goes through the sidecar first. This is the foundation of a service meshA dedicated infrastructure layer of sidecar proxies attached to every service. The mesh handles ALL inter-service communication: mutual TLS (encryption between services), automatic retries, circuit breaking, distributed tracing, and traffic policies — without changing application code. Istio and Linkerd are the two most popular service meshes..
Real software: Envoy (created by Matt Klein at Lyft in 2016, written in C++) is the dominant sidecar proxy. In Kubernetes with IstioThe most popular service mesh. Built by Google, IBM, and Lyft. It automatically injects an Envoy sidecar proxy into every Kubernetes pod. The control plane (Istiod) pushes routing rules, security policies, and observability config to all sidecars. Used by Google, Apple, Salesforce, and eBay in production., the sidecar is injected automatically. Run kubectl get pods and you'll see my-service 2/2 Running — the 2/2 means two containers: your app + the Envoy sidecar. Your app talks to localhost; Envoy handles mTLS, retries, circuit breaking, and distributed tracing.
The beauty of the sidecar model: your application code stays clean. Your service talks to localhost; the sidecar intercepts all outbound traffic and handles encryption, retries, circuit breaking, and metrics. When Lyft had 300+ microservices each writing their own retry logic in different languages, it was chaos. Envoy standardized all of it into one consistent proxy layer.
curl -x proxy:8080), reverse proxy = hides servers (Nginx, HAProxy), transparent proxy = invisible to both (ISP caching), sidecar proxy = one per service (Envoy in K8s). In system design interviews, "proxy" almost always means reverse proxy.
Going Deeper — What Proxies Actually Do Under the Hood
We said proxies handle "TLS, caching, load balancing, headers." But what does that actually mean? Not in theory — with real configs, real numbers, and real commands you can run. Let's unpack each superpower one by one.
When a browser connects to your site over HTTPS, there's a TLS handshake: certificate exchange, key negotiation, cipher suite agreement. Each handshake involves asymmetric cryptography (RSA-2048 or ECDHE), which costs roughly 2ms of CPU time. With TLS 1.3The latest version of TLS (released 2018). It reduces the handshake from 2 round trips to 1, removes deprecated cipher suites, and enables 0-RTT resumption for returning clients. All modern browsers and servers support it. Nginx enables it with ssl_protocols TLSv1.3., a fresh handshake is 1 round trip; with TLS 1.2, it's 2. Either way, it's CPU-intensive.
With TLS termination at the proxy, the math changes completely. The proxy handles all crypto. Backends receive plain HTTP on a private network. Here's the Nginx config:
server {
listen 443 ssl http2;
server_name yourapp.com;
# Certificate (from Let's Encrypt or your CA)
ssl_certificate /etc/letsencrypt/live/yourapp.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourapp.com/privkey.pem;
# Performance: cache TLS sessions so returning clients skip full handshake
ssl_session_cache shared:SSL:10m; # 10MB cache ≈ 40,000 sessions
ssl_session_timeout 1d; # Sessions valid for 24 hours
# Modern protocols only
ssl_protocols TLSv1.2 TLSv1.3;
location / {
proxy_pass http://backend; # Plain HTTP to backend — zero crypto work
}
}
The math: TLS handshake = ~2ms CPU per connection. At 10,000 new connections/sec, that's 20 seconds of CPU per second. With ssl_session_cache, returning clients (roughly 60-70% of traffic) skip the full handshake, cutting it to ~0.5ms. Now the same 10K connections need only ~8 seconds of CPU — and that's on the proxy, not your app server. Your backend's TLS CPU cost? Zero.
Every client connection has overhead: a TCP handshake (1 round trip), possibly a TLS handshake (1-2 more), plus memory for the socket buffer and file descriptor. If 10,000 clients connect directly to your backend, that's 10,000 open connections your server must maintain — each consuming ~10KB of kernel memory for TCP buffers, plus your application's per-connection state.
A proxy solves this with connection poolingMaintaining a small set of persistent (keep-alive) connections between the proxy and backend. The proxy accepts thousands of client connections on the front end but reuses a fixed pool of connections on the backend. 10,000 client connections might map to just 50 persistent backend connections.. It accepts all client connections, but uses a small pool of persistent connections to talk to your backend. In Nginx:
upstream backend {
server 10.0.1.1:8080;
server 10.0.1.2:8080;
keepalive 50; # Maintain 50 persistent connections to backends
# 10,000 client connections → 50 backend connections
# That's a 200x reduction
}
Why this matters critically: PostgreSQL's max_connections defaults to 100. Without a connection-pooling proxy, 100 concurrent users would max out your database. With a proxy like PgBouncerA lightweight connection pooler specifically for PostgreSQL. It sits between your app and PostgreSQL, maintaining a pool of database connections. 1,000 app connections can be served by 20 database connections. It's used by GitLab, Heroku, and thousands of production PostgreSQL deployments. (for database connections) or Nginx (for HTTP connections), 10,000 users map to 50 backend connections. Your database breathes easy.
When you have multiple backend servers, the proxy decides which one gets each request. This is load balancingDistributing incoming traffic across multiple servers so no single server is overwhelmed. The proxy tracks backend health and distributes requests based on an algorithm. If a server dies, the proxy stops sending it traffic within seconds.. Here's every major algorithm, when to use it, and the actual Nginx config for each:
Round Robin — take turns: Server 1, Server 2, Server 3, Server 1... This is Nginx's default — no directive needed. Simple, predictable, and works well when all servers are identical and requests take roughly the same time.
upstream backend {
server 10.0.1.1:8080; # Gets request 1, 4, 7...
server 10.0.1.2:8080; # Gets request 2, 5, 8...
server 10.0.1.3:8080; # Gets request 3, 6, 9...
}
Best for: Stateless APIs with identical servers. Bad for: Servers with unequal power or requests with wildly different processing times.
Least Connections — send each new request to whichever server currently has the fewest active connections. This adapts to varying request durations. If Server 1 is handling a slow database query, new requests go to Server 2 instead.
upstream backend {
least_conn; # Pick the server with fewest active connections
server 10.0.1.1:8080;
server 10.0.1.2:8080;
server 10.0.1.3:8080;
}
Best for: APIs with varying response times (some endpoints fast, some slow). Bad for: Simple uniform workloads (round robin is simpler and equivalent).
Weighted — more powerful servers get more traffic. If Server 1 has 8 CPU cores and Server 3 has 2, you don't want them getting equal traffic. Weight them proportionally.
upstream backend {
server 10.0.1.1:8080 weight=5; # 8-core box: 5x traffic
server 10.0.1.2:8080 weight=2; # 4-core box: 2x traffic
server 10.0.1.3:8080 weight=1; # 2-core box: baseline
# Out of every 8 requests: 5 → S1, 2 → S2, 1 → S3
}
Best for: Mixed hardware (different instance sizes). Bad for: Identical servers (just use round robin).
IP Hash — hash the client's IP address to pick a server. The same client always goes to the same backend. Useful for sticky sessionsWhen a user's requests always go to the same backend server. Needed when the server stores session state in memory (like a shopping cart). IP hash is one way to achieve this, but it breaks when clients share IPs (corporate NAT) or when a server dies. — but beware: if a server dies, all its clients get redistributed.
upstream backend {
ip_hash; # hash(client IP) → always same server
server 10.0.1.1:8080;
server 10.0.1.2:8080;
server 10.0.1.3:8080;
}
Best for: Legacy apps with server-side sessions. Bad for: Modern stateless APIs (use round robin or least-conn instead). Also problematic when many clients share one IP (corporate NAT).
When a proxy forwards a request to your backend, the backend sees the proxy's IP address, not the real client's. That's a problem — your app needs the real client IP for logging, rate limiting, geolocation, and analytics. Proxies solve this by injecting special HTTP headers:
X-Forwarded-For— the original client IP. If the request passed through multiple proxies, it's a comma-separated chain:X-Forwarded-For: 203.0.113.50, 198.51.100.10(client, then first proxy).X-Real-IP— the single original client IP (no chain). Simpler than X-Forwarded-For when you have one proxy.X-Forwarded-Proto— was the original request HTTP or HTTPS? After TLS termination, your backend receives HTTP. But it needs to know the original protocol to generate correct redirect URLs and set secure cookies.
Here's the Nginx config — these three lines are in every production proxy setup:
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Security headers — added uniformly to all responses
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header Strict-Transport-Security "max-age=63072000" always;
}
Why at the proxy? Consistency. Instead of every backend service implementing security headers independently (and some teams forgetting), the proxy enforces them uniformly across all responses. One place to configure, zero chance of a team forgetting Strict-Transport-Security.
Variations — Proxies in Disguise
Several things you've heard of are actually proxies wearing different hats. Understanding this helps you see the bigger picture — and nail system design interview questions where the interviewer asks "how is X different from Y?"
API Gateway vs Reverse Proxy
An API gatewayA specialized reverse proxy designed for APIs. It adds features like authentication (OAuth, JWT validation), per-client rate limiting (100 req/min for free tier, 10K for paid), request/response transformation, API versioning, and developer analytics — all on top of basic proxying. Kong, AWS API Gateway, and Apigee are popular implementations. is a reverse proxy with extra features bolted on. A plain reverse proxy forwards requests, caches, and load balances. An API gateway adds authentication (validate JWT tokens), rate limiting (per API key: free tier = 100 req/min, paid = 10K), request transformation (rename fields, merge responses), API versioning, and usage analytics.
Think of it this way: every API gateway is a reverse proxy, but not every reverse proxy is an API gateway.
Real examples: Kong (open-source, built on Nginx + Lua), AWS API Gateway (managed service, pay per request), Apigee (Google's enterprise offering). When to use which? If your system is a simple web app with a React frontend and one API, Nginx is all you need. If you have multiple APIs consumed by mobile, web, and partner clients — each with different rate limits, auth, and versioning — an API gateway earns its complexity.
Service Mesh — Proxies Everywhere
A service mesh takes the sidecar proxy concept from Section 6 and applies it to your entire system. Every microservice gets an Envoy sidecar. A central control planeThe "brain" of a service mesh. It configures all the sidecar proxies: telling them routing rules, security policies, retry budgets, and rate limits. In Istio, the control plane is called Istiod. It pushes config to every Envoy sidecar via the xDS API (discovery service). The sidecars are the "data plane" — they actually process traffic. (Istiod in Istio, or Linkerd's control plane) manages all the sidecars, pushing routing rules, mTLS certificates, and observability config to them.
Real usage: Istio (by Google/IBM/Lyft) is the most popular — used by Google, Apple, Salesforce, eBay. Linkerd (by Buoyant) is simpler and lighter — good for smaller teams. Both use Envoy as the sidecar proxy (Linkerd uses its own lightweight proxy, linkerd2-proxy, written in Rust).
Trade-off: A service mesh adds real complexity. You need Kubernetes, you need to learn the mesh's configuration, and every request now passes through two extra proxy hops (source sidecar → destination sidecar). Latency increases by 1-3ms per hop. It's powerful for 50+ microservices, but overkill for a monolith with 3 services.
CDN as a Proxy — Cloudflare, Akamai, Fastly
A CDNContent Delivery Network — a global network of reverse proxy servers at 300+ locations worldwide. When a user in Mumbai requests your image, they get it from a Mumbai CDN server instead of your origin in Virginia. The CDN caches static content, terminates TLS at the edge, and absorbs DDoS attacks. Cloudflare, Akamai, Fastly, and AWS CloudFront are the major CDNs. is a globally distributed network of reverse proxies. Cloudflare has 300+ data centers worldwide. When you put your site behind Cloudflare, your DNS points to Cloudflare's IP — not yours. Every request hits the nearest Cloudflare edge server first.
Try it: curl -I https://cloudflare.com. Look at the cf-ray header: cf-ray: 8a1b2c3d4e5f-BOM. That BOM is the IATA airport code for Mumbai. Your request was served by a Cloudflare proxy physically in Mumbai, India. The origin server could be in San Francisco — you never connected to it.
Cloudflare's free tier includes DDoS protection, CDN caching, and TLS termination — all as a reverse proxy. You just change your DNS nameservers to Cloudflare's, and they proxy all traffic to your origin server. Your real server IP stays hidden. That's enterprise-grade proxy infrastructure for $0.
At Scale — Real Stories from the Biggest Proxy Operations
Proxies are not academic tools — they're the backbone of the modern internet. Every single request to Netflix, GitHub, or Lyft passes through at least one proxy before it reaches a backend. Let's look at the real numbers behind four companies that run proxies at planet scale.
Cloudflare — The Internet's Reverse Proxy
Cloudflare is, in the simplest terms, the world's largest reverse proxy. Over 20% of all web traffic passes through their network. They operate 300+ data centers in 100+ countries, with a total network capacity exceeding 248 Tbps — enough to absorb the biggest DDoS attacks ever recorded.
Here's what makes Cloudflare's story remarkable: they give it away for free. Their free tier includes DDoS protection, CDN caching, TLS termination, and a web application firewall. You just change your domain's nameservers to Cloudflare's, and all traffic is proxied through them. Your origin server's real IP stays hidden. Millions of small websites get enterprise-grade proxy infrastructure for $0 — the business model works because larger customers pay for advanced features like Workers, rate limiting rules, and priority routing.
Try it on any Cloudflare-protected site: curl -I https://cloudflare.com. Look for the cf-ray header — the last three letters are an IATA airport codeIATA codes are three-letter identifiers for airports worldwide (e.g., BOM = Mumbai, CDG = Paris, NRT = Tokyo). Cloudflare uses these codes to tag every response with the data center that served it. So cf-ray: abc123-BOM means your request was handled by a proxy in Mumbai, India. telling you which Cloudflare data center served your request. cf-ray: 8a1b2c3d4e5f-BOM means Mumbai handled it. Your origin server could be in Virginia — you never connected to it directly.
Netflix Zuul — The Java API Gateway Handling 1B+ Requests/Day
Netflix built Zuul, a Java-based API gateway, because they needed a programmable reverse proxy that could handle their unique requirements: A/B testing at the edge, canary routing, authentication, and request decoration — all before traffic reached backend microservices.
Zuul handles all API traffic entering Netflix — more than 1 billion requests per day. It's the single entry point for every play button, every search query, every profile load from 230+ million subscribers worldwide. The gateway runs as a cluster of JVM instances behind an AWS Elastic Load Balancer.
What makes Zuul special is its filter architecture. Developers write Groovy or Java filters that run at different stages of the request lifecycle: pre-filters (authentication, rate limiting), routing filters (choosing which backend to send to), and post-filters (adding headers, logging). Netflix famously uses this to route 1% of traffic to canary deployments before rolling out to everyone.
Netflix open-sourced Zuul, but most companies today use Spring Cloud Gateway (Zuul's spiritual successor) or Envoy instead. Zuul 1 was blocking (one thread per request), which struggled at high concurrency. Zuul 2 moved to non-blocking I/O with Netty, but by then Envoy had already won the community.
Lyft & Envoy — The Proxy That Changed the Industry
In 2016, Matt Klein at Lyft created Envoy because existing proxies (Nginx, HAProxy) weren't designed for microservice environments. Lyft had hundreds of services, and debugging network issues between them was a nightmare. They needed a proxy that understood the service mesh world: L7 observability, automatic retries, circuit breaking, and distributed tracing — all built in.
Envoy is a C++ proxy designed from the ground up for modern distributed systems. It runs as a sidecarA sidecar proxy runs alongside your application in the same pod or host. Your app talks to localhost, and the sidecar handles everything else: TLS, retries, load balancing, tracing. The app code never deals with network complexity. next to every microservice, handling all network traffic transparently. The application code just talks to localhost — Envoy handles TLS, retries, load balancing, and sending telemetry data to tracing systems like Jaeger or Zipkin.
Today, Envoy is a CNCF graduated project (the same tier as Kubernetes). It's used by Google, Apple, Netflix, Stripe, Airbnb, and thousands more. Istio, the most popular service mesh, uses Envoy as its data plane. When you hear "service mesh," Envoy is almost always what's actually proxying the traffic underneath.
GitHub & HAProxy — Proxying Every Git Push on Earth
GitHub uses HAProxy as their primary load balancer and reverse proxy. Every git clone, git push, pull request view, and API call passes through HAProxy clusters before reaching GitHub's application servers. At peak, that's 2+ million concurrent connections.
GitHub chose HAProxy for its raw performance at L4 (TCP) proxying. Git operations over SSH and HTTPS are long-lived connections that transfer large amounts of data — you need a proxy that excels at connection handling, not just HTTP request routing. HAProxy's event-driven architecture handles this efficiently with minimal memory overhead per connection.
GitHub runs HAProxy with custom health checks that go beyond simple TCP pings. Their health checks verify that backend application servers can actually process requests — not just that the port is open. If a server is up but overloaded (responding slowly), HAProxy's health checks catch it and route traffic away. This is critical when you have millions of developers depending on git push working every time.
The Anti-Lesson — Things That Sound Right but Aren't
Proxies are powerful, so people tend to reach for them everywhere. Here are three pieces of "advice" that sound reasonable but lead to real problems in production. If you hear any of these in an interview, you'll know why they're wrong.
This sounds like good architecture — more proxies, more control, right? Wrong. Every proxy hop adds latency (1-5ms per hop), complexity (another thing to configure, monitor, and debug), and failure surface (another process that can crash or misconfigure).
A request that goes through CDN → API gateway → service mesh sidecar → application has three proxy hops. That's 3-15ms of overhead before your code even runs. For a real-time gaming server or high-frequency trading system, that's unacceptable. For a blog? A single Nginx instance is plenty.
This is a false dichotomy. Nginx and HAProxy are different tools that overlap in some areas. Saying one is better than the other is like saying a Swiss Army knife is better than a chef's knife — it depends what you're doing.
Nginx is a web server that also does reverse proxying. It can serve static files, run Lua scripts, cache responses, and act as an HTTP/2 gateway. It's the default choice when you need a reverse proxy and a web server on the same box.
HAProxy is a pure proxy — it doesn't serve files or run scripts. What it does, it does exceptionally well: L4 (TCP) and L7 (HTTP) load balancing with detailed health checks, connection draining, and stunningly low latency. It's the default choice when raw proxying performance is the priority (like GitHub's git operations).
A CDN is great for static and cached content — images, CSS, JavaScript, HTML pages that don't change per user. But it cannot replace a reverse proxy for dynamic routing, authentication, rate limiting, or request transformation.
When a user hits /api/orders/123, that request needs to be authenticated, routed to the right microservice, and potentially transformed (adding internal headers, stripping sensitive data). A CDN doesn't do any of that — it looks for a cached response and, if there's a miss, passes the request straight to your origin.
In practice, production systems use both: a CDN at the edge for static assets and DDoS protection, and a reverse proxy (Nginx, Envoy, or an API gateway) behind it for dynamic traffic. The CDN handles the 80% of traffic that's cacheable; the reverse proxy handles the 20% that requires logic.
Common Mistakes — What People Get Wrong About Proxies
These are the mistakes that cause real outages and confused debugging sessions. If you've configured Nginx or HAProxy, you've probably hit at least two of these. Learn them here so you don't learn them in a 3 AM incident.
When a reverse proxy forwards a request, the backend server sees the proxy's IP address as the source — not the client's real IP. If your app logs IP addresses for analytics, rate limiting, or fraud detection, every single request looks like it came from the same machine: your proxy.
The fix is the X-Forwarded-For header. Your proxy must add it, and your backend must read it. In Nginx:
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
Without these headers, your rate limiter thinks one client is making all 10,000 requests. You'll either rate-limit everyone or rate-limit nobody.
You added a reverse proxy to improve reliability — but if that proxy is a single instance, you just moved your single point of failure instead of eliminating it. When that one Nginx box goes down, every service behind it becomes unreachable.
The fix: run at least two proxy instances behind a DNS round-robin or a cloud load balancer (like AWS NLB). Use keepalivedA Linux daemon that implements VRRP (Virtual Router Redundancy Protocol). Two Nginx servers share a virtual IP address. The primary handles traffic; the secondary monitors the primary's health. If the primary dies, the secondary claims the virtual IP within seconds — clients never know the switch happened. with a virtual IP for on-premises setups, or rely on cloud provider health checks for managed environments. The proxy that protects your backends needs its own protection.
Nginx's default proxy_read_timeout is 60 seconds. If your backend takes 61 seconds to respond (a report generation endpoint, for example), Nginx drops the connection and returns a 504. The backend keeps working — wasting resources on a request nobody's waiting for anymore.
Worse: if all your proxy worker threads are stuck waiting on slow backends, new requests queue up and eventually time out too. One slow endpoint cascades into site-wide failures.
# Match timeouts to your actual endpoint SLAs
proxy_connect_timeout 5s; # How long to wait for backend TCP handshake
proxy_send_timeout 10s; # How long to wait sending request body
proxy_read_timeout 30s; # How long to wait for backend response
# For long-running endpoints, set per-location overrides:
location /api/reports {
proxy_read_timeout 120s; # Reports can take 2 minutes
proxy_pass http://report-service;
}
Caching GET responses is great — the same product page can be served from cache thousands of times. But if your proxy configuration accidentally caches POST responses, users submitting forms or creating orders might get back a cached response from someone else's request. That's a data leak and a correctness nightmare.
By default, Nginx doesn't cache POST, but custom cache configurations can accidentally include all methods. Always verify your cache rules exclude non-idempotent methods (POST, PUT, DELETE, PATCH).
proxy_cache_methods GET HEAD POST; in your Nginx config, remove POST immediately. Only cache GET and HEAD responses unless you have a very specific reason and understand the implications.
Your proxy is the front door to your entire system, but many teams never monitor it. They monitor backend services, databases, and cache hit rates — but the proxy itself is invisible. When it starts dropping connections or running out of file descriptors, nobody notices until users complain.
At minimum, monitor these metrics on every proxy: active connections (is it near the limit?), request rate (is traffic spiking?), error rate (4xx/5xx responses), latency percentiles (p50, p95, p99), and upstream health (how many backends are healthy?). Nginx exposes these via the stub_status module; HAProxy has a built-in stats page.
worker_connections Trap
Nginx's default worker_connections is 512 per worker process. With 4 worker processes, that's 2,048 total connections. Sounds like a lot — until you realize each proxied request uses two connections (one from the client, one to the backend). So you actually support ~1,024 concurrent proxied requests. During a traffic spike, new connections are silently dropped.
worker_processes auto; # One per CPU core
events {
worker_connections 4096; # Per worker — tune based on traffic
multi_accept on; # Accept all new connections at once
use epoll; # Linux: efficient event notification
}
# Also raise OS-level limits:
# ulimit -n 65535
# sysctl net.core.somaxconn=65535
If you're proxying WebSocket connections (which are long-lived), the problem is even worse — each WebSocket holds a connection open for minutes or hours. You'll hit the limit much faster than with short HTTP requests.
Interview Playbook — Proxy Questions by Level
Proxy questions come up in system design interviews more than people expect. Whether you're asked "explain forward vs reverse proxy" or "design a service mesh," the depth you go into signals your level. Here's what each level should demonstrate:
Question: "Explain the difference between a forward proxy and a reverse proxy."
This is the most common proxy interview question at the junior level. Here's how to nail it:
- Forward proxy — sits in front of clients. The client knows it's using a proxy. Example: a corporate proxy that filters employee internet access, or a VPN. The server doesn't know who the real client is.
- Reverse proxy — sits in front of servers. The client doesn't know it exists. Example: Nginx in front of your web app. The client thinks it's talking directly to your application.
- Key difference: Forward proxy hides the client's identity. Reverse proxy hides the server's identity.
Bonus points: Mention that TLS termination happens at the reverse proxy (so backends don't need TLS certificates), and that load balancing is the most common reverse proxy use case.
Question: "Design a reverse proxy setup for a web application with 3 backend servers."
Walk through the architecture step by step:
- DNS points
app.example.comto the proxy's IP (or a cloud LB in front of 2 proxy instances for HA) - TLS termination at the proxy — clients connect via HTTPS, the proxy holds the certificate, backends receive plain HTTP on a private network
- Load balancing — round-robin for stateless APIs, least-connections if backends have varying response times, ip-hash if you need session affinity
- Health checks — the proxy pings each backend every 5-10s, removes unhealthy servers from the pool, re-adds them when they recover
- Headers —
X-Forwarded-For,X-Real-IP,X-Forwarded-Protoso backends know the client's real IP and whether the original request was HTTPS - Caching — cache static assets (images, CSS, JS) at the proxy, set
Cache-Controlheaders, useproxy_cache_pathin Nginx
Bonus points: Mention rate limiting per IP (limit_req_zone in Nginx), connection draining during deployments, and monitoring (active connections, error rates, p99 latency).
Question: "How would you implement a service mesh with mTLS and canary deployments?"
This is where you demonstrate deep understanding of modern proxy architecture:
- Service mesh architecture — every microservice gets an Envoy sidecar proxy. A control plane (Istiod in Istio) pushes configuration to all sidecars via the xDS APIxDS stands for "x Discovery Service" — a family of APIs that Envoy uses to receive dynamic configuration. CDS (Cluster), EDS (Endpoint), LDS (Listener), RDS (Route), SDS (Secret). The control plane pushes updates through these APIs so you never need to restart Envoy to change routing rules..
- mTLS (mutual TLS) — both client and server verify each other's identity. The control plane acts as a certificate authority, issuing short-lived certificates to every sidecar. Service A's sidecar presents its cert to Service B's sidecar. No service can communicate without a valid mesh identity. This is zero-trust networking within your cluster.
- Canary deployments — deploy a new version alongside the old one. Configure the mesh to split traffic: 95% to v1, 5% to v2. Monitor error rates and latency on v2. If metrics look good, gradually shift more traffic. If something breaks, instantly route 100% back to v1. The application code doesn't change — the sidecar proxies handle all traffic splitting.
- Traffic splitting in Istio uses
VirtualServiceandDestinationRuleresources. You define weights per version, and Istiod pushes the routing rules to every Envoy sidecar.
Bonus points: Discuss the latency trade-off (2 extra hops per request: source sidecar → destination sidecar, adding 2-6ms), when a service mesh is overkill (fewer than 10 services), and alternatives like Linkerd (simpler, Rust-based proxy, lower overhead).
Practice Exercises — Hands-On with Proxies
Reading about proxies is one thing. Configuring them is another. These exercises go from "copy this config and run it" to "build one from scratch." You'll learn more from 30 minutes of hands-on Nginx than from 3 hours of reading.
Create a minimal Nginx config that reverse-proxies to a local Node.js or Python server. The goal: hit http://localhost:80 and have Nginx forward the request to your app running on port 3000.
You need a server block listening on port 80, with a location / block that uses proxy_pass. Don't forget to set proxy_set_header Host $host;.
server {
listen 80;
server_name localhost;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
Test it: curl -v http://localhost. You should see your app's response, but the connection went through Nginx first.
Configure Nginx with two upstream servers and use a bash loop to verify round-robin distribution. Run for i in $(seq 1 10); do curl -s http://localhost/health; done and confirm requests alternate between backends.
Define an upstream block with two server directives (different ports). Have each backend return a different response so you can tell which one answered.
upstream backend {
server 127.0.0.1:3001; # App instance 1
server 127.0.0.1:3002; # App instance 2
}
server {
listen 80;
location / {
proxy_pass http://backend;
}
}
Start two simple servers that return their port number, then run the curl loop. You'll see responses alternating between 3001 and 3002.
Compare CPU usage when TLS is handled by the proxy vs by each backend. Use openssl speed rsa2048 to benchmark RSA operations, then test with wrk or ab to measure throughput difference when 3 backends each do their own TLS vs when the proxy terminates TLS once.
Run openssl speed rsa2048 to see how many RSA operations per second your CPU can handle. Then think: if you have 3 backends each doing TLS, that's 3x the RSA work. With proxy TLS termination, it's 1x.
Create a docker-compose.yml with your app container and an Envoy sidecar container. The sidecar should handle all inbound traffic on port 8080 and proxy to your app on port 3000. Write the Envoy envoy.yaml config from scratch.
Envoy config has three key sections: listeners (what port to listen on), clusters (where to forward traffic), and routes (which listener maps to which cluster). In Docker Compose, put both containers on the same network so the sidecar can reach the app at app:3000.
Write a 20-line Python proxy using the http.server and urllib modules. It should listen on port 8080, accept any HTTP request, forward it to http://httpbin.org, and return the response. This teaches you what a proxy actually does at the network level.
Subclass http.server.BaseHTTPRequestHandler. In do_GET, use urllib.request.urlopen() to fetch from the upstream, then write the response back to the client with self.wfile.write().
from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.request import urlopen, Request
UPSTREAM = "http://httpbin.org"
class ProxyHandler(BaseHTTPRequestHandler):
def do_GET(self):
url = UPSTREAM + self.path
resp = urlopen(Request(url, headers={"Host": "httpbin.org"}))
self.send_response(resp.status)
for key, val in resp.getheaders():
self.send_header(key, val)
self.end_headers()
self.wfile.write(resp.read())
HTTPServer(("0.0.0.0", 8080), ProxyHandler).serve_forever()
Run it: python proxy.py. Then curl http://localhost:8080/get — you'll get httpbin's response, routed through your proxy. That's a reverse proxy in 15 lines.
Cheat Sheet — Quick Reference Cards
Forward Proxy Sits in front of: CLIENTS Hides: Client identity Example: Corporate proxy, VPN Client knows: YES Reverse Proxy Sits in front of: SERVERS Hides: Server identity Example: Nginx, Cloudflare Client knows: NO
upstream backend {
server 10.0.0.1:3000;
server 10.0.0.2:3000;
}
server {
listen 443 ssl;
ssl_certificate /etc/ssl/cert.pem;
ssl_certificate_key /etc/ssl/key.pem;
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP
$remote_addr;
}
}
frontend http_front bind *:80 default_backend servers backend servers balance roundrobin option httpchk GET /health server s1 10.0.0.1:3000 check server s2 10.0.0.2:3000 check # Health check: if /health # returns non-200, server is # removed from pool automatically.
Round Robin Equal distribution
Default in Nginx/HAProxy
Least Conns Route to server with
fewest active requests
IP Hash Same client IP always
hits same backend
Weighted Servers get traffic
proportional to weight
Random Simple, surprisingly
effective at scale
X-Forwarded-For Client's real IP chain "203.0.113.50, 70.41.3.18" X-Real-IP Original client IP only "203.0.113.50" X-Forwarded-Proto Original protocol "https" (even if backend=HTTP) Host Original domain name "app.example.com"
Simple web app
→ Nginx reverse proxy
High-perf TCP proxying
→ HAProxy
API management + auth
→ API Gateway (Kong, AWS)
50+ microservices
→ Service Mesh (Istio)
Global static content
→ CDN (Cloudflare)
All of the above?
→ CDN + Gateway + Mesh
(they stack, not replace)
Connected Topics — Where Proxies Lead Next
Proxies don't exist in isolation — they connect to almost every other concept in system design. Once you understand proxies, these related topics become much easier to learn because you already know the "middleman" mental model.