Networking Foundations

Proxies

The invisible middlemen between you and every server. Reverse proxies, forward proxies, sidecar proxies — they handle TLS, load balancing, caching, rate limiting, and DDoS protection. Every production system has one.

8 Think Firsts 19 SVG Diagrams 15 Sections Live Commands 24 Tooltips
Section 1

TL;DR — The One-Minute Version

Mental Model: A proxy is a middleman that sits between two parties and handles communication on their behalf. Like a receptionist at an office — visitors don't walk straight to your desk. They check in at the front desk first. The receptionist decides who gets through, who waits, and who gets turned away.

Right now, open a terminal and type curl -v https://google.com 2>&1 | grep -i server. You'll see server: gws in the response headers. GWS stands for Google Web Server. But here's the thing — you never connected to GWS directly. Your request first hit a Google Front End (GFE)Google's custom reverse proxy that sits in front of all their services. It terminates TLS, authenticates requests, applies rate limits, and routes to the right backend. Every Google request — Search, Gmail, YouTube — goes through GFE first., a reverse proxy that terminated your TLS connection, checked your request, applied rate limits, and then forwarded it to GWS on a private internal network. You talked to a proxy. The proxy talked to the real server. You never knew the difference.

What Actually Happened When You Ran curl google.com 💻 Your Terminal curl -v google.com HTTPS Google Front End (REVERSE PROXY) ✓ Terminates TLS ✓ Rate limits ✓ Routes to backend HTTP (internal) 🖥️ GWS Backend Private network only You saw "server: gws" in the header — but you never talked to GWS directly. The proxy handled everything. You didn't even know it was there.

Try another one: curl -I https://cloudflare.com. Look at the response header cf-ray: 8a1b2c3d4e5f-BOM. That three-letter code at the end — BOM — is Cloudflare's airport code for Mumbai. It means your request hit a Cloudflare reverse proxy physically located in Mumbai. The actual origin server could be in San Francisco, Frankfurt, or anywhere. The proxy served you from the closest location.

These aren't obscure concepts. Nginx powers 34% of all websites (Netcraft 2024) — Apple, Netflix, Airbnb all run it. HAProxy handles load balancing for GitHub, Stack Overflow, and Reddit, supporting 2 million+ concurrent connections. Envoy, created by Lyft in 2016, is the default sidecar proxy in Istio service meshes and handles 10M+ requests per second at Google's scale. Every time you load a webpage, stream a video, or call an API, a proxy touched your request.

The Big Three — Real Proxy Software Running the Internet Nginx 34% of all websites 10K+ concurrent with 2.5MB RAM Apple, Netflix, Airbnb, WordPress.com proxy_pass http://backend; That one line = reverse proxy HAProxy 2M concurrent connections 100K+ HTTP req/sec per core GitHub, Stack Overflow, Reddit server web1 10.0.1.1:8080 check Born as a pure load balancer Envoy 10M+ RPS at Google scale Default sidecar in Istio mesh Google, Uber, Salesforce, Apple Created by Lyft (Matt Klein, 2016) C++, cloud-native, API-driven
One-line takeaway: A proxy is a middleman between client and server. Forward proxies protect clients; reverse proxies protect servers. Nginx, HAProxy, and Envoy run the internet's proxy layer — you interact with them every single day.
Section 2

The Scenario — Your Server Is Drowning

You've built an API for a recipe-sharing app. It's running on a single DigitalOcean droplet: 4 vCPUs, 8GB RAM, Ubuntu 22.04, public IP 203.0.113.42. You deployed with docker compose up -d, pointed your domain at the IP, and installed a Let's Encrypt certificate with certbot. Everything works. Your API responds in 45ms. Life is good.

Then a food blogger with 2 million followers posts your app. Traffic goes from 50 requests per second to 10,000. And here's what happens — not in theory, but on your actual server:

The Meltdown Timeline — Real Numbers Tuesday AM 50 req/sec Response: 45ms CPU: 12% Tuesday 2 PM 3,000 req/sec Response: 800ms TLS handshakes eat 40% CPU Tuesday 4 PM 8,000 req/sec Response: 4.2 seconds Bot traffic starts (IP scanned) open connections: 8,000+ Tuesday 6 PM 10,000+ req/sec OOM killer: process killed dmesg: "Out of memory" Site is DOWN TOTAL MELTDOWN Your server is doing 5 jobs at once: app logic, TLS handshakes, serving static files, fighting bots, and logging. It can't handle all of them.

Here's what your terminal would show if you SSH'd into the box at 4 PM:

ssh root@203.0.113.42
# Check CPU usage — TLS handshakes eating everything
$ top -bn1 | head -5
%Cpu(s): 94.2 us,  3.1 sy,  0.0 ni,  1.8 id,  0.0 wa
MiB Mem:  7962.4 total,  312.8 free,  7128.6 used,  521.0 buff
# ^ Only 312MB free. OOM killer coming soon.

# Check open connections — each client holds a TCP connection
$ ss -s
TCP:   8247 (estab 7891, closed 112, orphaned 34, timewait 210)
# ^ 8,247 open TCP connections. Each one = memory + file descriptor.

# Check who's hitting you — bots found your IP
$ journalctl -u myapp --since "1 hour ago" | grep -c "bot"
14,293
# ^ 14K bot requests in the last hour. Your IP is public, remember.

Your server is doing everything: running your application, encrypting every connection with TLS (each TLS handshakeThe initial negotiation when a client connects over HTTPS. It takes 2 round trips and involves certificate exchange, key generation, and cipher negotiation. At 10K clients, that's 10K handshakes — each consuming CPU cycles for cryptographic operations. costs CPU), serving static images, fighting off 14,000 bot requests, and maintaining 8,000 open connections. That's like asking a restaurant chef to also be the bouncer, the waiter, the cashier, and the dishwasher — during Friday dinner rush.

The math is brutal: Each TLS handshake takes ~2ms of CPU time. At 10,000 new connections per second, that's 20 seconds of CPU time every second just for handshakes — on a 4-core server that only has 4 seconds of CPU time per second. You'd need 5x your current hardware just for TLS alone. And that's before your app even processes a single request.
Think First: Your server's public IP is now known to every client — and every bot. You can't hide it, you can't throttle individual abusers, and you can't add a second server because clients are hardcoded to one IP. You need someone standing in front.
Section 3

First Attempt — Direct Client-to-Server

Let's look at what you actually set up. The simplest possible architecture: your domain's DNS A record points directly to your server's IP address. Every client connects straight to that IP. No middleman. You can see this yourself — run dig +short yourapp.com and you'll get back 203.0.113.42. That's your one server. Everyone in the world now knows its address.

The Naive Setup: Every Client Talks Directly to Your Server 👤 Real Users 🤖 Bots / Scrapers 💀 DDoS Attack PUBLIC INTERNET 🖥️ Your Single Server IP: 203.0.113.42 App + TLS + Static Files + Logging 4 vCPUs, 8GB RAM ⚠ IP exposed to world ⚠ TLS eats 40% CPU ⚠ No caching layer ⚠ Can't add servers ⚠ Bots mix with real traffic dig +short yourapp.com → 203.0.113.42 Everyone in the world knows exactly where your server lives.

This setup works fine when you have 50 users. But it has fundamental problems that a firewall alone can't fix:

Real-world example: In October 2016, the Mirai botnet launched a DDoS attack against Dyn, a DNS provider. Dyn's servers were directly exposed. The attack peaked at 1.2 Tbps and took down Twitter, Netflix, Reddit, and GitHub for hours. A reverse proxy layer like Cloudflare can absorb attacks of this magnitude because they have 300+ data centers to distribute the load — your single server doesn't stand a chance.

Some developers say: "My cloud provider gives me a firewall and auto-scaling. Isn't that enough?" Not for this. A firewallA network device or software that filters traffic based on IP addresses, ports, and protocols. It works at Layer 3/4 (network/transport). It can block IP ranges and close ports, but it can't understand HTTP content, cache responses, or distribute traffic across servers. operates at Layer 3/4 — it blocks IP ranges and closes ports. It doesn't understand HTTP, can't cache responses, can't terminate TLS for you, and can't intelligently route requests to multiple backends. Auto-scaling adds new servers, but without a proxy in front, there's no way to route traffic to them. You need both. The proxy is the traffic cop; the firewall is the locked gate.

Think First: Right now your server is doing five jobs: running your app, handling TLS encryption, serving static files, fighting bots, and logging. Each job steals CPU from the others. The first step to fixing this isn't getting a bigger server — it's separating the jobs.
Section 4

Where It Breaks — Four Fatal Flaws

The direct-exposure setup has four fundamental problems. Each one is a ticking time bomb, and they all go off at the same time — when your app starts getting real traffic. Let's put real numbers on each failure mode.

1. TLS Is Crushing Your CPU

Every HTTPS connection starts with a TLS handshakeA multi-step cryptographic negotiation at the start of every HTTPS connection. Client and server exchange certificates, agree on a cipher suite, and derive session keys. This involves asymmetric cryptography (RSA or ECDHE), which is computationally expensive — roughly 2ms of CPU time per handshake. — two round trips of cryptographic negotiation. On your 4-core server, each handshake costs about 2ms of CPU time. Sounds tiny, but do the math:

  • At 1,000 new connections/sec: 2ms x 1,000 = 2 seconds of CPU time per second. Half a core. Manageable.
  • At 5,000 new connections/sec: 2ms x 5,000 = 10 seconds of CPU per second. You only have 4 seconds of CPU per second (4 cores). TLS alone needs 2.5x your entire server.
  • At 10,000 new connections/sec: 20 seconds of CPU per second. Physically impossible on 4 cores.

A proxy solves this completely. TLS termination happens at the proxy — with session resumptionA TLS optimization where returning clients skip the expensive full handshake and reuse a previously established session. The proxy stores session tickets, cutting handshakes from 2 round trips to 1 and saving 50%+ of the CPU cost. Nginx enables this with ssl_session_cache shared:SSL:10m. enabled, returning clients skip the full handshake entirely. The backend gets plain HTTP on a private network — zero crypto work.

TLS CPU Math — Why Your Server Dies Your CPU: 4 sec/sec (4 cores) 2s 1K conn/s — fine (half a core) 10s needed 5K conn/s — 2.5x over capacity. Impossible.

2. No Caching = Wasted Work

Your recipe app's homepage is the same for everyone — top 20 recipes, a search bar, some images. Without a cache, every visitor triggers the same database query, the same template rendering, the same JSON serialization. A thousand users requesting /api/recipes/popular in one second means your database runs the exact same query a thousand times.

With a caching proxy, the first request hits your backend (a cache missWhen the proxy doesn't have a cached copy of the requested content. It fetches from the backend, stores the response, and serves it. All subsequent identical requests become cache hits — served from proxy memory in under 1ms, without touching the backend at all.), the proxy stores the response, and the next 999 are served from proxy memory in under 1ms. Your backend processes one request instead of a thousand — a 99.9% reduction in load for cacheable content.

Check it yourself: curl -I https://yourapp.com/api/recipes/popular and look at the Cache-Control header. If it says no-store or there's no header at all, every identical response is being regenerated from scratch. That's massive wasted compute.

3. Your IP Is a Target

Run dig +short yourapp.com. That returns 203.0.113.42. Now anyone — including attackers — knows exactly where your server lives. Unlike a site behind Cloudflare (where dig returns Cloudflare's IP, not yours), nothing sits between the attacker and your machine.

A DDoS attack at even 10 Gbps saturates your server's 1 Gbps network link. Your cloud firewall can't help — the traffic fills the pipe before rules apply. You need something with massive network capacity (Cloudflare has 248+ Tbps) to absorb the flood before it reaches you. That something is a reverse proxy network.

Even without DDoS, bots find your IP within hours. Shodan.io continuously scans all 4 billion IPv4 addresses. Run shodan host 203.0.113.42 and you'll see every open port. If SSH (22), HTTPS (443), and PostgreSQL (5432) are all visible, you're exposed on three fronts.

4. Scaling Is Impossible

You add a second server at 203.0.113.43. But dig yourapp.com still returns only .42. You could add a second DNS A record (DNS round-robin), but DNS has no concept of server health. If .42 dies, half your users still get routed there because their DNS resolver cached it for hours. DNS doesn't know if a server is overloaded, down, or on fire.

You need something at a single IP that accepts all traffic and distributes it to healthy backends. Something that checks each server's health every few seconds with GET /health and automatically removes dead servers from rotation. That's exactly what Nginx's upstream block does with the check directive — and what HAProxy does with server web1 10.0.1.1:8080 check inter 3s fall 3 rise 2.

Four Fatal Flaws — All Hit at Once TLS Overhead 2ms/handshake x 10K = 20s CPU/s needed 4 cores = 4s CPU/s 5x over capacity No Cache 1000 identical requests = 1000 DB queries With cache: 1 + 999 hits 99.9% wasted work IP Exposed dig = real server address 10 Gbps DDoS kills your 1 Gbps link No absorption layer Can't Scale DNS: no health checks Cache TTL = hours Dead server = errors Need smart routing
Think First: All four problems share one root cause: your server is directly exposed to the internet with nothing in front of it. The fix isn't four separate solutions. It's one architectural change that solves all of them at once.
Section 5

The Breakthrough — Put a Shield in Front

The fix is one of the most important patterns in all of system design: stop exposing your server directly. Put a machine in front of it — a reverse proxyA server that sits between clients and your backend servers. Clients send requests to the proxy's IP address (they don't even know your real servers exist). The proxy handles TLS, caching, rate limiting, and load balancing, then forwards clean HTTP requests to your backends on a private network. — that handles all the messy, dangerous, repetitive work. Your DNS now points to the proxy's IP. Clients talk to the proxy. The proxy talks to your servers. Your servers' real IPs are hidden from the world.

In practice, this is as simple as installing Nginx on a separate machine and adding a few lines of configuration:

/etc/nginx/nginx.conf — The core of a reverse proxy
upstream backend {
    server 10.0.1.1:8080;    # App server 1 (private IP — unreachable from internet)
    server 10.0.1.2:8080;    # App server 2 (add more anytime)
}

server {
    listen 443 ssl;
    server_name yourapp.com;

    ssl_certificate     /etc/letsencrypt/live/yourapp.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourapp.com/privkey.pem;

    location / {
        proxy_pass http://backend;                                    # Forward to backend over plain HTTP
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;  # Pass real client IP
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host $host;
    }
}

That's it. proxy_pass http://backend; — that single directive is the heart of a reverse proxy. The upstream block lists your backend servers by private IPs (10.0.1.x — unreachable from the internet). Nginx handles TLS, accepts client connections, and forwards clean HTTP to your backends. To add a third server, add one line: server 10.0.1.3:8080; and run nginx -s reload. Zero downtime. Clients don't notice a thing.

The Fix: Nginx Reverse Proxy as a Shield Client 1 Client 2 Client 3 Bot / DDoS HTTPS NGINX PROXY IP: 198.51.100.10 (public) ✓ TLS termination ✓ Caching (proxy_cache) ✓ Rate limiting ✓ Load balancing ✗ Blocks bots + DDoS HTTP (private) PRIVATE NETWORK (10.0.1.x) App Server 1 10.0.1.1:8080 App Server 2 10.0.1.2:8080 App Server 3 10.0.1.3:8080 dig +short yourapp.com → 198.51.100.10 (proxy's IP — real servers are hidden)
Before vs After — real numbers:
  • TLS: Backend CPU 94% → 12% (proxy handles all crypto, backends do zero TLS)
  • Caching: 10,000 DB queries/sec → 10/sec (99.9% served from proxy cache)
  • Security: Real IP public → Real IP hidden, bots rate-limited at proxy
  • Scaling: 1 server, can't add more → Add servers with one config line, zero downtime
  • Connections: 10K clients → 10K backend connections → Now: 10K clients → 50 pooled connections
Think First: Notice the HTTPS → HTTP transition in the diagram. Clients connect to the proxy over encrypted HTTPS. The proxy decrypts and forwards plain HTTP to backends on a private network. This is TLS termination. It's safe because the 10.0.1.x network is private — unreachable from the public internet. Your data travels encrypted across the internet and unencrypted only inside your own trusted network.
Section 6

How It Works — Four Types of Proxies

Not all proxies are the same. They differ in who they protect, where they sit in the network, and whether anyone knows they're there. Let's break down the four main types — each with real software you can install and real commands you can run.

1. Forward Proxy — Protects the Client

A forward proxy sits in front of clients. It takes a client's request and sends it to the internet on the client's behalf. The server on the other end never sees the real client — it only sees the proxy's IP address. Think of it like sending mail through a P.O. Box: the recipient sees the box number, not your home address.

Where you've used this: Your company's corporate network almost certainly has one. When you access the internet at work, traffic goes through a proxy like SquidAn open-source forward proxy and caching server used by thousands of ISPs and corporations worldwide. It can cache web content to save bandwidth, filter URLs to block unsafe sites, and log all traffic for compliance. First released in 1996, it's still widely deployed today. or Zscaler. IT can see which domains you visit (but not HTTPS content), block social media, and scan for malware. Try it: curl -x proxy.company.com:8080 https://google.com — the -x flag tells curl to route through a proxy.

Developers use forward proxies too: Charles Proxy and mitmproxy sit between your browser and the internet, letting you inspect every HTTP request and response. Mobile developers use them constantly to debug API calls from iOS and Android apps.

Forward Proxy — Corporate Network Example CORPORATE NETWORK Dev laptop HR desktop FORWARD PROXY Squid / Zscaler Blocks, logs, caches proxy.company.com:8080 Proxy's IP only Internet google.com sees proxy IP, not users curl -x proxy.company.com:8080 https://google.com

2. Reverse Proxy — Protects the Server

A reverse proxy sits in front of servers. The client has no idea the proxy exists — it thinks it's talking directly to the real server. This is the proxy type you'll encounter in 90% of system design discussions. When someone says "proxy" without qualification, they almost always mean a reverse proxy.

Real software: Nginx, HAProxy, Envoy, Traefik, Caddy. Every major website runs a reverse proxy. Check for yourself: curl -I https://github.com and look at the response headers — you'll see their load balancing infrastructure, not the application server.

Here's a production Nginx reverse proxy config — the same pattern used by millions of sites:

/etc/nginx/conf.d/app.conf
upstream backend {
    server 10.0.1.1:8080 weight=5;  # Stronger server gets 5x traffic
    server 10.0.1.2:8080 weight=2;  # Medium server
    server 10.0.1.3:8080 weight=1;  # Smallest server
    keepalive 50;                     # Pool: 50 persistent backend connections
}

server {
    listen 443 ssl http2;
    server_name api.yourapp.com;

    # TLS termination — proxy handles ALL crypto
    ssl_certificate     /etc/letsencrypt/live/api.yourapp.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.yourapp.com/privkey.pem;
    ssl_session_cache   shared:SSL:10m;  # Returning clients skip full handshake

    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
Reverse Proxy — Server-Side Shield 👤 Client HTTPS REVERSE PROXY Nginx / HAProxy TLS, cache, route, balance keepalive 50; Private Network (10.0.1.x) HTTP Server 1 (w=5) 10.0.1.1 Server 2 (w=2) 10.0.1.2 Server 3 (w=1) 10.0.1.3

3. Transparent Proxy — Nobody Knows It's There

A transparent proxyA proxy that intercepts traffic without any client configuration. The client doesn't set up a proxy, the server doesn't know one exists. It's deployed at the network level — often by ISPs or corporate IT — and silently processes all traffic passing through it. intercepts traffic without anyone configuring it. You don't set a proxy in your browser. The server doesn't know it's there. It sits in the network path and silently processes everything that passes through.

Where you've experienced this: Ever connected to hotel Wi-Fi and been redirected to a login page? That's a transparent proxy intercepting your HTTP request and injecting a redirect. Your ISP probably runs one too — it caches popular content (why re-download a Windows update for every customer?) and may filter content for compliance.

You can detect transparent proxies. Run traceroute google.com and look for unexpected hops with high latency or ISP names. Some proxies add Via: or X-Forwarded-For: headers that reveal their presence. Note: transparent proxies struggle with HTTPS because they can't see the encrypted content — this is why your ISP can see which domains you visit but not the page content.

Transparent Proxy — Invisible Interception Your Browser "Going directly to google" ISP TRANSPARENT PROXY caches, filters, logs — you never configured it Hotel Wi-Fi captive portals work this way too google.com "Talking to the user" Dashed border = invisible. Detect with: traceroute google.com (look for unexpected hops)

4. Sidecar Proxy — One Per Service

A sidecar proxy runs alongside your application — in the same Kubernetes pod or the same VM. Instead of one central proxy for everything, every service gets its own tiny proxy. All traffic in and out goes through the sidecar first. This is the foundation of a service meshA dedicated infrastructure layer of sidecar proxies attached to every service. The mesh handles ALL inter-service communication: mutual TLS (encryption between services), automatic retries, circuit breaking, distributed tracing, and traffic policies — without changing application code. Istio and Linkerd are the two most popular service meshes..

Real software: Envoy (created by Matt Klein at Lyft in 2016, written in C++) is the dominant sidecar proxy. In Kubernetes with IstioThe most popular service mesh. Built by Google, IBM, and Lyft. It automatically injects an Envoy sidecar proxy into every Kubernetes pod. The control plane (Istiod) pushes routing rules, security policies, and observability config to all sidecars. Used by Google, Apple, Salesforce, and eBay in production., the sidecar is injected automatically. Run kubectl get pods and you'll see my-service 2/2 Running — the 2/2 means two containers: your app + the Envoy sidecar. Your app talks to localhost; Envoy handles mTLS, retries, circuit breaking, and distributed tracing.

Sidecar Proxy — Kubernetes with Istio/Envoy Pod: order-svc (2/2) Order App :8080 Envoy Sidecar :15001 mTLS, retries, tracing mTLS Pod: payment-svc (2/2) Envoy Sidecar Payment App :8080 circuit breaking, load balancing mTLS Pod: inventory (2/2) Envoy App observability, retries $ kubectl get pods → order-svc 2/2 Running | payment-svc 2/2 Running

The beauty of the sidecar model: your application code stays clean. Your service talks to localhost; the sidecar intercepts all outbound traffic and handles encryption, retries, circuit breaking, and metrics. When Lyft had 300+ microservices each writing their own retry logic in different languages, it was chaos. Envoy standardized all of it into one consistent proxy layer.

Think First: Quick cheat sheet — forward proxy = hides clients (curl -x proxy:8080), reverse proxy = hides servers (Nginx, HAProxy), transparent proxy = invisible to both (ISP caching), sidecar proxy = one per service (Envoy in K8s). In system design interviews, "proxy" almost always means reverse proxy.
Section 7

Going Deeper — What Proxies Actually Do Under the Hood

We said proxies handle "TLS, caching, load balancing, headers." But what does that actually mean? Not in theory — with real configs, real numbers, and real commands you can run. Let's unpack each superpower one by one.

SSL/TLS Termination — Saving 2-4ms Per Request

When a browser connects to your site over HTTPS, there's a TLS handshake: certificate exchange, key negotiation, cipher suite agreement. Each handshake involves asymmetric cryptography (RSA-2048 or ECDHE), which costs roughly 2ms of CPU time. With TLS 1.3The latest version of TLS (released 2018). It reduces the handshake from 2 round trips to 1, removes deprecated cipher suites, and enables 0-RTT resumption for returning clients. All modern browsers and servers support it. Nginx enables it with ssl_protocols TLSv1.3., a fresh handshake is 1 round trip; with TLS 1.2, it's 2. Either way, it's CPU-intensive.

With TLS termination at the proxy, the math changes completely. The proxy handles all crypto. Backends receive plain HTTP on a private network. Here's the Nginx config:

TLS termination in Nginx
server {
    listen 443 ssl http2;
    server_name yourapp.com;

    # Certificate (from Let's Encrypt or your CA)
    ssl_certificate     /etc/letsencrypt/live/yourapp.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourapp.com/privkey.pem;

    # Performance: cache TLS sessions so returning clients skip full handshake
    ssl_session_cache   shared:SSL:10m;   # 10MB cache ≈ 40,000 sessions
    ssl_session_timeout 1d;                # Sessions valid for 24 hours

    # Modern protocols only
    ssl_protocols TLSv1.2 TLSv1.3;

    location / {
        proxy_pass http://backend;  # Plain HTTP to backend — zero crypto work
    }
}

The math: TLS handshake = ~2ms CPU per connection. At 10,000 new connections/sec, that's 20 seconds of CPU per second. With ssl_session_cache, returning clients (roughly 60-70% of traffic) skip the full handshake, cutting it to ~0.5ms. Now the same 10K connections need only ~8 seconds of CPU — and that's on the proxy, not your app server. Your backend's TLS CPU cost? Zero.

TLS Termination — Encrypted Outside, Plain Inside Browser HTTPS (encrypted) Nginx Proxy decrypts here, cert lives here ssl_session_cache shared:SSL:10m HTTP (plain) Server 1 Server 2 Zero crypto work here Savings: 2-4ms per request. At 10K req/s = 20-40 seconds of CPU/s moved off your app servers.
Connection Pooling — 10K Clients, 50 Backend Connections

Every client connection has overhead: a TCP handshake (1 round trip), possibly a TLS handshake (1-2 more), plus memory for the socket buffer and file descriptor. If 10,000 clients connect directly to your backend, that's 10,000 open connections your server must maintain — each consuming ~10KB of kernel memory for TCP buffers, plus your application's per-connection state.

A proxy solves this with connection poolingMaintaining a small set of persistent (keep-alive) connections between the proxy and backend. The proxy accepts thousands of client connections on the front end but reuses a fixed pool of connections on the backend. 10,000 client connections might map to just 50 persistent backend connections.. It accepts all client connections, but uses a small pool of persistent connections to talk to your backend. In Nginx:

Connection pooling in Nginx upstream
upstream backend {
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;
    keepalive 50;  # Maintain 50 persistent connections to backends
                   # 10,000 client connections → 50 backend connections
                   # That's a 200x reduction
}

Why this matters critically: PostgreSQL's max_connections defaults to 100. Without a connection-pooling proxy, 100 concurrent users would max out your database. With a proxy like PgBouncerA lightweight connection pooler specifically for PostgreSQL. It sits between your app and PostgreSQL, maintaining a pool of database connections. 1,000 app connections can be served by 20 database connections. It's used by GitLab, Heroku, and thousands of production PostgreSQL deployments. (for database connections) or Nginx (for HTTP connections), 10,000 users map to 50 backend connections. Your database breathes easy.

Connection Pooling — Many-to-Few 10,000 clients ...many more 10K connections Nginx Proxy keepalive 50; Reuses 50 connections 200x reduction 50 persistent Server 1 Server 2 Server 3
Load Balancing Algorithms — Round Robin, Least Conn, IP Hash, Weighted

When you have multiple backend servers, the proxy decides which one gets each request. This is load balancingDistributing incoming traffic across multiple servers so no single server is overwhelmed. The proxy tracks backend health and distributes requests based on an algorithm. If a server dies, the proxy stops sending it traffic within seconds.. Here's every major algorithm, when to use it, and the actual Nginx config for each:

Round Robin — take turns: Server 1, Server 2, Server 3, Server 1... This is Nginx's default — no directive needed. Simple, predictable, and works well when all servers are identical and requests take roughly the same time.

upstream backend {
    server 10.0.1.1:8080;   # Gets request 1, 4, 7...
    server 10.0.1.2:8080;   # Gets request 2, 5, 8...
    server 10.0.1.3:8080;   # Gets request 3, 6, 9...
}

Best for: Stateless APIs with identical servers. Bad for: Servers with unequal power or requests with wildly different processing times.

Least Connections — send each new request to whichever server currently has the fewest active connections. This adapts to varying request durations. If Server 1 is handling a slow database query, new requests go to Server 2 instead.

upstream backend {
    least_conn;              # Pick the server with fewest active connections
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;
    server 10.0.1.3:8080;
}

Best for: APIs with varying response times (some endpoints fast, some slow). Bad for: Simple uniform workloads (round robin is simpler and equivalent).

Weighted — more powerful servers get more traffic. If Server 1 has 8 CPU cores and Server 3 has 2, you don't want them getting equal traffic. Weight them proportionally.

upstream backend {
    server 10.0.1.1:8080 weight=5;  # 8-core box: 5x traffic
    server 10.0.1.2:8080 weight=2;  # 4-core box: 2x traffic
    server 10.0.1.3:8080 weight=1;  # 2-core box: baseline
    # Out of every 8 requests: 5 → S1, 2 → S2, 1 → S3
}

Best for: Mixed hardware (different instance sizes). Bad for: Identical servers (just use round robin).

IP Hash — hash the client's IP address to pick a server. The same client always goes to the same backend. Useful for sticky sessionsWhen a user's requests always go to the same backend server. Needed when the server stores session state in memory (like a shopping cart). IP hash is one way to achieve this, but it breaks when clients share IPs (corporate NAT) or when a server dies. — but beware: if a server dies, all its clients get redistributed.

upstream backend {
    ip_hash;                 # hash(client IP) → always same server
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;
    server 10.0.1.3:8080;
}

Best for: Legacy apps with server-side sessions. Bad for: Modern stateless APIs (use round robin or least-conn instead). Also problematic when many clients share one IP (corporate NAT).

When to Use Which Algorithm Round Robin Identical servers Uniform request times Default. Start here. Least Connections Varying response times Some endpoints slow Smartest general choice. Weighted Mixed hardware 8-core + 2-core boxes Match traffic to capacity. IP Hash Sticky sessions needed Legacy server-side state Avoid if possible.
Header Manipulation — X-Forwarded-For, X-Real-IP, X-Forwarded-Proto

When a proxy forwards a request to your backend, the backend sees the proxy's IP address, not the real client's. That's a problem — your app needs the real client IP for logging, rate limiting, geolocation, and analytics. Proxies solve this by injecting special HTTP headers:

  • X-Forwarded-For — the original client IP. If the request passed through multiple proxies, it's a comma-separated chain: X-Forwarded-For: 203.0.113.50, 198.51.100.10 (client, then first proxy).
  • X-Real-IP — the single original client IP (no chain). Simpler than X-Forwarded-For when you have one proxy.
  • X-Forwarded-Proto — was the original request HTTP or HTTPS? After TLS termination, your backend receives HTTP. But it needs to know the original protocol to generate correct redirect URLs and set secure cookies.

Here's the Nginx config — these three lines are in every production proxy setup:

Header manipulation in Nginx
location / {
    proxy_pass http://backend;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    # Security headers — added uniformly to all responses
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header Strict-Transport-Security "max-age=63072000" always;
}

Why at the proxy? Consistency. Instead of every backend service implementing security headers independently (and some teams forgetting), the proxy enforces them uniformly across all responses. One place to configure, zero chance of a team forgetting Strict-Transport-Security.

Think First: The four superpowers — TLS termination, connection pooling, load balancing, header manipulation — are not independent features. They work together. The proxy terminates TLS (saving backend CPU), pools connections (reducing backend memory), load balances across backends (enabling horizontal scaling), and injects headers (preserving client information). One piece of infrastructure, four critical capabilities.
Section 8

Variations — Proxies in Disguise

Several things you've heard of are actually proxies wearing different hats. Understanding this helps you see the bigger picture — and nail system design interview questions where the interviewer asks "how is X different from Y?"

API Gateway vs Reverse Proxy

An API gatewayA specialized reverse proxy designed for APIs. It adds features like authentication (OAuth, JWT validation), per-client rate limiting (100 req/min for free tier, 10K for paid), request/response transformation, API versioning, and developer analytics — all on top of basic proxying. Kong, AWS API Gateway, and Apigee are popular implementations. is a reverse proxy with extra features bolted on. A plain reverse proxy forwards requests, caches, and load balances. An API gateway adds authentication (validate JWT tokens), rate limiting (per API key: free tier = 100 req/min, paid = 10K), request transformation (rename fields, merge responses), API versioning, and usage analytics.

Think of it this way: every API gateway is a reverse proxy, but not every reverse proxy is an API gateway.

Reverse Proxy vs API Gateway Reverse Proxy TLS termination Load balancing Caching Compression API Gateway (adds) Auth (OAuth, JWT, API keys) Per-client rate limiting Request transformation Analytics + developer portal Routing

Real examples: Kong (open-source, built on Nginx + Lua), AWS API Gateway (managed service, pay per request), Apigee (Google's enterprise offering). When to use which? If your system is a simple web app with a React frontend and one API, Nginx is all you need. If you have multiple APIs consumed by mobile, web, and partner clients — each with different rate limits, auth, and versioning — an API gateway earns its complexity.

Start simple: Begin with a reverse proxy (Nginx). Graduate to an API gateway only when you need per-client rate limiting, API key management, or request transformation. Don't over-engineer.

Service Mesh — Proxies Everywhere

A service mesh takes the sidecar proxy concept from Section 6 and applies it to your entire system. Every microservice gets an Envoy sidecar. A central control planeThe "brain" of a service mesh. It configures all the sidecar proxies: telling them routing rules, security policies, retry budgets, and rate limits. In Istio, the control plane is called Istiod. It pushes config to every Envoy sidecar via the xDS API (discovery service). The sidecars are the "data plane" — they actually process traffic. (Istiod in Istio, or Linkerd's control plane) manages all the sidecars, pushing routing rules, mTLS certificates, and observability config to them.

Service Mesh Architecture — Control Plane + Data Plane CONTROL PLANE (Istiod) Pushes config, certs, policies to all sidecars via xDS API DATA PLANE (Envoy sidecars) Order Envoy mTLS + retries + tracing Envoy Payment circuit breaking + LB Inventory Envoy observability + auth App code handles business logic only. Envoy handles ALL networking.

Real usage: Istio (by Google/IBM/Lyft) is the most popular — used by Google, Apple, Salesforce, eBay. Linkerd (by Buoyant) is simpler and lighter — good for smaller teams. Both use Envoy as the sidecar proxy (Linkerd uses its own lightweight proxy, linkerd2-proxy, written in Rust).

Trade-off: A service mesh adds real complexity. You need Kubernetes, you need to learn the mesh's configuration, and every request now passes through two extra proxy hops (source sidecar → destination sidecar). Latency increases by 1-3ms per hop. It's powerful for 50+ microservices, but overkill for a monolith with 3 services.

CDN as a Proxy — Cloudflare, Akamai, Fastly

A CDNContent Delivery Network — a global network of reverse proxy servers at 300+ locations worldwide. When a user in Mumbai requests your image, they get it from a Mumbai CDN server instead of your origin in Virginia. The CDN caches static content, terminates TLS at the edge, and absorbs DDoS attacks. Cloudflare, Akamai, Fastly, and AWS CloudFront are the major CDNs. is a globally distributed network of reverse proxies. Cloudflare has 300+ data centers worldwide. When you put your site behind Cloudflare, your DNS points to Cloudflare's IP — not yours. Every request hits the nearest Cloudflare edge server first.

Try it: curl -I https://cloudflare.com. Look at the cf-ray header: cf-ray: 8a1b2c3d4e5f-BOM. That BOM is the IATA airport code for Mumbai. Your request was served by a Cloudflare proxy physically in Mumbai, India. The origin server could be in San Francisco — you never connected to it.

CDN = 300+ Reverse Proxies Worldwide Origin Server San Francisco CDN Tokyo (NRT) ~5ms to Japan users CDN London (LHR) ~8ms to UK users CDN Sydney (SYD) ~3ms to AUS users CDN Sao Paulo (GRU) ~6ms to BR users CDN Mumbai (BOM) ~4ms to India users 👤 curl -I cloudflare.com → cf-ray: 8a1b2c3d4e5f-BOM (BOM = Mumbai served your request) Cloudflare: 300+ locations, 248+ Tbps capacity, proxies 20%+ of all web traffic

Cloudflare's free tier includes DDoS protection, CDN caching, and TLS termination — all as a reverse proxy. You just change your DNS nameservers to Cloudflare's, and they proxy all traffic to your origin server. Your real server IP stays hidden. That's enterprise-grade proxy infrastructure for $0.

Think First: Three questions to decide which proxy variation you need: (1) "Do I need auth, rate limiting, and API key management?" → API Gateway. (2) "Do I have 50+ microservices needing mTLS and retries?" → Service Mesh. (3) "Do I need global caching and DDoS protection?" → CDN. Most production systems use 2-3 of these together. They're not mutually exclusive.
Section 9

At Scale — Real Stories from the Biggest Proxy Operations

Proxies are not academic tools — they're the backbone of the modern internet. Every single request to Netflix, GitHub, or Lyft passes through at least one proxy before it reaches a backend. Let's look at the real numbers behind four companies that run proxies at planet scale.

Cloudflare — The Internet's Reverse Proxy

Cloudflare is, in the simplest terms, the world's largest reverse proxy. Over 20% of all web traffic passes through their network. They operate 300+ data centers in 100+ countries, with a total network capacity exceeding 248 Tbps — enough to absorb the biggest DDoS attacks ever recorded.

Here's what makes Cloudflare's story remarkable: they give it away for free. Their free tier includes DDoS protection, CDN caching, TLS termination, and a web application firewall. You just change your domain's nameservers to Cloudflare's, and all traffic is proxied through them. Your origin server's real IP stays hidden. Millions of small websites get enterprise-grade proxy infrastructure for $0 — the business model works because larger customers pay for advanced features like Workers, rate limiting rules, and priority routing.

Try it on any Cloudflare-protected site: curl -I https://cloudflare.com. Look for the cf-ray header — the last three letters are an IATA airport codeIATA codes are three-letter identifiers for airports worldwide (e.g., BOM = Mumbai, CDG = Paris, NRT = Tokyo). Cloudflare uses these codes to tag every response with the data center that served it. So cf-ray: abc123-BOM means your request was handled by a proxy in Mumbai, India. telling you which Cloudflare data center served your request. cf-ray: 8a1b2c3d4e5f-BOM means Mumbai handled it. Your origin server could be in Virginia — you never connected to it directly.

Netflix Zuul — The Java API Gateway Handling 1B+ Requests/Day

Netflix built Zuul, a Java-based API gateway, because they needed a programmable reverse proxy that could handle their unique requirements: A/B testing at the edge, canary routing, authentication, and request decoration — all before traffic reached backend microservices.

Zuul handles all API traffic entering Netflix — more than 1 billion requests per day. It's the single entry point for every play button, every search query, every profile load from 230+ million subscribers worldwide. The gateway runs as a cluster of JVM instances behind an AWS Elastic Load Balancer.

What makes Zuul special is its filter architecture. Developers write Groovy or Java filters that run at different stages of the request lifecycle: pre-filters (authentication, rate limiting), routing filters (choosing which backend to send to), and post-filters (adding headers, logging). Netflix famously uses this to route 1% of traffic to canary deployments before rolling out to everyone.

Netflix open-sourced Zuul, but most companies today use Spring Cloud Gateway (Zuul's spiritual successor) or Envoy instead. Zuul 1 was blocking (one thread per request), which struggled at high concurrency. Zuul 2 moved to non-blocking I/O with Netty, but by then Envoy had already won the community.

Lyft & Envoy — The Proxy That Changed the Industry

In 2016, Matt Klein at Lyft created Envoy because existing proxies (Nginx, HAProxy) weren't designed for microservice environments. Lyft had hundreds of services, and debugging network issues between them was a nightmare. They needed a proxy that understood the service mesh world: L7 observability, automatic retries, circuit breaking, and distributed tracing — all built in.

Envoy is a C++ proxy designed from the ground up for modern distributed systems. It runs as a sidecarA sidecar proxy runs alongside your application in the same pod or host. Your app talks to localhost, and the sidecar handles everything else: TLS, retries, load balancing, tracing. The app code never deals with network complexity. next to every microservice, handling all network traffic transparently. The application code just talks to localhost — Envoy handles TLS, retries, load balancing, and sending telemetry data to tracing systems like Jaeger or Zipkin.

Today, Envoy is a CNCF graduated project (the same tier as Kubernetes). It's used by Google, Apple, Netflix, Stripe, Airbnb, and thousands more. Istio, the most popular service mesh, uses Envoy as its data plane. When you hear "service mesh," Envoy is almost always what's actually proxying the traffic underneath.

GitHub & HAProxy — Proxying Every Git Push on Earth

GitHub uses HAProxy as their primary load balancer and reverse proxy. Every git clone, git push, pull request view, and API call passes through HAProxy clusters before reaching GitHub's application servers. At peak, that's 2+ million concurrent connections.

GitHub chose HAProxy for its raw performance at L4 (TCP) proxying. Git operations over SSH and HTTPS are long-lived connections that transfer large amounts of data — you need a proxy that excels at connection handling, not just HTTP request routing. HAProxy's event-driven architecture handles this efficiently with minimal memory overhead per connection.

GitHub runs HAProxy with custom health checks that go beyond simple TCP pings. Their health checks verify that backend application servers can actually process requests — not just that the port is open. If a server is up but overloaded (responding slowly), HAProxy's health checks catch it and route traffic away. This is critical when you have millions of developers depending on git push working every time.

Think First: Notice the pattern: Cloudflare chose Nginx-based proxies for HTTP-heavy edge traffic. Netflix built a custom JVM gateway for API-layer logic. Lyft built Envoy for service-to-service communication. GitHub chose HAProxy for raw TCP throughput. The "best proxy" depends entirely on where in the stack it sits and what kind of traffic it handles.
Section 10

The Anti-Lesson — Things That Sound Right but Aren't

Proxies are powerful, so people tend to reach for them everywhere. Here are three pieces of "advice" that sound reasonable but lead to real problems in production. If you hear any of these in an interview, you'll know why they're wrong.

This sounds like good architecture — more proxies, more control, right? Wrong. Every proxy hop adds latency (1-5ms per hop), complexity (another thing to configure, monitor, and debug), and failure surface (another process that can crash or misconfigure).

A request that goes through CDN → API gateway → service mesh sidecar → application has three proxy hops. That's 3-15ms of overhead before your code even runs. For a real-time gaming server or high-frequency trading system, that's unacceptable. For a blog? A single Nginx instance is plenty.

Rule of thumb: Add a proxy when it solves a specific problem (TLS termination, load balancing, rate limiting). Never add one "just in case" or because an architecture diagram looks more impressive with more boxes.

This is a false dichotomy. Nginx and HAProxy are different tools that overlap in some areas. Saying one is better than the other is like saying a Swiss Army knife is better than a chef's knife — it depends what you're doing.

Nginx is a web server that also does reverse proxying. It can serve static files, run Lua scripts, cache responses, and act as an HTTP/2 gateway. It's the default choice when you need a reverse proxy and a web server on the same box.

HAProxy is a pure proxy — it doesn't serve files or run scripts. What it does, it does exceptionally well: L4 (TCP) and L7 (HTTP) load balancing with detailed health checks, connection draining, and stunningly low latency. It's the default choice when raw proxying performance is the priority (like GitHub's git operations).

Interview answer: "I'd use Nginx when I need a web server + reverse proxy combo, and HAProxy when I need a dedicated high-performance load balancer. Many large systems use both — HAProxy at L4, Nginx at L7."

A CDN is great for static and cached content — images, CSS, JavaScript, HTML pages that don't change per user. But it cannot replace a reverse proxy for dynamic routing, authentication, rate limiting, or request transformation.

When a user hits /api/orders/123, that request needs to be authenticated, routed to the right microservice, and potentially transformed (adding internal headers, stripping sensitive data). A CDN doesn't do any of that — it looks for a cached response and, if there's a miss, passes the request straight to your origin.

In practice, production systems use both: a CDN at the edge for static assets and DDoS protection, and a reverse proxy (Nginx, Envoy, or an API gateway) behind it for dynamic traffic. The CDN handles the 80% of traffic that's cacheable; the reverse proxy handles the 20% that requires logic.

Remember: CDN = static content + DDoS shield. Reverse proxy = dynamic routing + auth + rate limiting. They complement each other; they don't replace each other.
Section 11

Common Mistakes — What People Get Wrong About Proxies

These are the mistakes that cause real outages and confused debugging sessions. If you've configured Nginx or HAProxy, you've probably hit at least two of these. Learn them here so you don't learn them in a 3 AM incident.

When a reverse proxy forwards a request, the backend server sees the proxy's IP address as the source — not the client's real IP. If your app logs IP addresses for analytics, rate limiting, or fraud detection, every single request looks like it came from the same machine: your proxy.

The fix is the X-Forwarded-For header. Your proxy must add it, and your backend must read it. In Nginx:

nginx.conf — proxy headers
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP       $remote_addr;
proxy_set_header Host            $host;

Without these headers, your rate limiter thinks one client is making all 10,000 requests. You'll either rate-limit everyone or rate-limit nobody.

You added a reverse proxy to improve reliability — but if that proxy is a single instance, you just moved your single point of failure instead of eliminating it. When that one Nginx box goes down, every service behind it becomes unreachable.

The fix: run at least two proxy instances behind a DNS round-robin or a cloud load balancer (like AWS NLB). Use keepalivedA Linux daemon that implements VRRP (Virtual Router Redundancy Protocol). Two Nginx servers share a virtual IP address. The primary handles traffic; the secondary monitors the primary's health. If the primary dies, the secondary claims the virtual IP within seconds — clients never know the switch happened. with a virtual IP for on-premises setups, or rely on cloud provider health checks for managed environments. The proxy that protects your backends needs its own protection.

Nginx's default proxy_read_timeout is 60 seconds. If your backend takes 61 seconds to respond (a report generation endpoint, for example), Nginx drops the connection and returns a 504. The backend keeps working — wasting resources on a request nobody's waiting for anymore.

Worse: if all your proxy worker threads are stuck waiting on slow backends, new requests queue up and eventually time out too. One slow endpoint cascades into site-wide failures.

nginx.conf — timeout tuning
# Match timeouts to your actual endpoint SLAs
proxy_connect_timeout  5s;   # How long to wait for backend TCP handshake
proxy_send_timeout     10s;  # How long to wait sending request body
proxy_read_timeout     30s;  # How long to wait for backend response

# For long-running endpoints, set per-location overrides:
location /api/reports {
    proxy_read_timeout 120s;  # Reports can take 2 minutes
    proxy_pass http://report-service;
}

Caching GET responses is great — the same product page can be served from cache thousands of times. But if your proxy configuration accidentally caches POST responses, users submitting forms or creating orders might get back a cached response from someone else's request. That's a data leak and a correctness nightmare.

By default, Nginx doesn't cache POST, but custom cache configurations can accidentally include all methods. Always verify your cache rules exclude non-idempotent methods (POST, PUT, DELETE, PATCH).

Watch out: If you see proxy_cache_methods GET HEAD POST; in your Nginx config, remove POST immediately. Only cache GET and HEAD responses unless you have a very specific reason and understand the implications.

Your proxy is the front door to your entire system, but many teams never monitor it. They monitor backend services, databases, and cache hit rates — but the proxy itself is invisible. When it starts dropping connections or running out of file descriptors, nobody notices until users complain.

At minimum, monitor these metrics on every proxy: active connections (is it near the limit?), request rate (is traffic spiking?), error rate (4xx/5xx responses), latency percentiles (p50, p95, p99), and upstream health (how many backends are healthy?). Nginx exposes these via the stub_status module; HAProxy has a built-in stats page.

Nginx's default worker_connections is 512 per worker process. With 4 worker processes, that's 2,048 total connections. Sounds like a lot — until you realize each proxied request uses two connections (one from the client, one to the backend). So you actually support ~1,024 concurrent proxied requests. During a traffic spike, new connections are silently dropped.

nginx.conf — connection tuning
worker_processes auto;          # One per CPU core
events {
    worker_connections 4096;    # Per worker — tune based on traffic
    multi_accept on;            # Accept all new connections at once
    use epoll;                  # Linux: efficient event notification
}

# Also raise OS-level limits:
# ulimit -n 65535
# sysctl net.core.somaxconn=65535

If you're proxying WebSocket connections (which are long-lived), the problem is even worse — each WebSocket holds a connection open for minutes or hours. You'll hit the limit much faster than with short HTTP requests.

Section 12

Interview Playbook — Proxy Questions by Level

Proxy questions come up in system design interviews more than people expect. Whether you're asked "explain forward vs reverse proxy" or "design a service mesh," the depth you go into signals your level. Here's what each level should demonstrate:

Proxy Interview Depth by Level Junior Forward vs Reverse TLS termination basics Load balancing concept Mid-Level Design reverse proxy setup Nginx config + health checks Caching + rate limiting Senior Service mesh + mTLS Canary deploys + traffic split Envoy xDS + control plane

Question: "Explain the difference between a forward proxy and a reverse proxy."

This is the most common proxy interview question at the junior level. Here's how to nail it:

  • Forward proxy — sits in front of clients. The client knows it's using a proxy. Example: a corporate proxy that filters employee internet access, or a VPN. The server doesn't know who the real client is.
  • Reverse proxy — sits in front of servers. The client doesn't know it exists. Example: Nginx in front of your web app. The client thinks it's talking directly to your application.
  • Key difference: Forward proxy hides the client's identity. Reverse proxy hides the server's identity.

Bonus points: Mention that TLS termination happens at the reverse proxy (so backends don't need TLS certificates), and that load balancing is the most common reverse proxy use case.

Question: "Design a reverse proxy setup for a web application with 3 backend servers."

Walk through the architecture step by step:

  1. DNS points app.example.com to the proxy's IP (or a cloud LB in front of 2 proxy instances for HA)
  2. TLS termination at the proxy — clients connect via HTTPS, the proxy holds the certificate, backends receive plain HTTP on a private network
  3. Load balancing — round-robin for stateless APIs, least-connections if backends have varying response times, ip-hash if you need session affinity
  4. Health checks — the proxy pings each backend every 5-10s, removes unhealthy servers from the pool, re-adds them when they recover
  5. HeadersX-Forwarded-For, X-Real-IP, X-Forwarded-Proto so backends know the client's real IP and whether the original request was HTTPS
  6. Caching — cache static assets (images, CSS, JS) at the proxy, set Cache-Control headers, use proxy_cache_path in Nginx

Bonus points: Mention rate limiting per IP (limit_req_zone in Nginx), connection draining during deployments, and monitoring (active connections, error rates, p99 latency).

Question: "How would you implement a service mesh with mTLS and canary deployments?"

This is where you demonstrate deep understanding of modern proxy architecture:

  • Service mesh architecture — every microservice gets an Envoy sidecar proxy. A control plane (Istiod in Istio) pushes configuration to all sidecars via the xDS APIxDS stands for "x Discovery Service" — a family of APIs that Envoy uses to receive dynamic configuration. CDS (Cluster), EDS (Endpoint), LDS (Listener), RDS (Route), SDS (Secret). The control plane pushes updates through these APIs so you never need to restart Envoy to change routing rules..
  • mTLS (mutual TLS) — both client and server verify each other's identity. The control plane acts as a certificate authority, issuing short-lived certificates to every sidecar. Service A's sidecar presents its cert to Service B's sidecar. No service can communicate without a valid mesh identity. This is zero-trust networking within your cluster.
  • Canary deployments — deploy a new version alongside the old one. Configure the mesh to split traffic: 95% to v1, 5% to v2. Monitor error rates and latency on v2. If metrics look good, gradually shift more traffic. If something breaks, instantly route 100% back to v1. The application code doesn't change — the sidecar proxies handle all traffic splitting.
  • Traffic splitting in Istio uses VirtualService and DestinationRule resources. You define weights per version, and Istiod pushes the routing rules to every Envoy sidecar.

Bonus points: Discuss the latency trade-off (2 extra hops per request: source sidecar → destination sidecar, adding 2-6ms), when a service mesh is overkill (fewer than 10 services), and alternatives like Linkerd (simpler, Rust-based proxy, lower overhead).

Section 13

Practice Exercises — Hands-On with Proxies

Reading about proxies is one thing. Configuring them is another. These exercises go from "copy this config and run it" to "build one from scratch." You'll learn more from 30 minutes of hands-on Nginx than from 3 hours of reading.

Exercise 1: Set Up an Nginx Reverse Proxy Easy

Create a minimal Nginx config that reverse-proxies to a local Node.js or Python server. The goal: hit http://localhost:80 and have Nginx forward the request to your app running on port 3000.

You need a server block listening on port 80, with a location / block that uses proxy_pass. Don't forget to set proxy_set_header Host $host;.

nginx.conf
server {
    listen 80;
    server_name localhost;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_set_header Host            $host;
        proxy_set_header X-Real-IP       $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Test it: curl -v http://localhost. You should see your app's response, but the connection went through Nginx first.

Exercise 2: Test Load Balancing with a curl Loop Easy

Configure Nginx with two upstream servers and use a bash loop to verify round-robin distribution. Run for i in $(seq 1 10); do curl -s http://localhost/health; done and confirm requests alternate between backends.

Define an upstream block with two server directives (different ports). Have each backend return a different response so you can tell which one answered.

nginx.conf
upstream backend {
    server 127.0.0.1:3001;  # App instance 1
    server 127.0.0.1:3002;  # App instance 2
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

Start two simple servers that return their port number, then run the curl loop. You'll see responses alternating between 3001 and 3002.

Exercise 3: Measure TLS Termination Savings Medium

Compare CPU usage when TLS is handled by the proxy vs by each backend. Use openssl speed rsa2048 to benchmark RSA operations, then test with wrk or ab to measure throughput difference when 3 backends each do their own TLS vs when the proxy terminates TLS once.

Run openssl speed rsa2048 to see how many RSA operations per second your CPU can handle. Then think: if you have 3 backends each doing TLS, that's 3x the RSA work. With proxy TLS termination, it's 1x.

Exercise 4: Configure an Envoy Sidecar in Docker Hard

Create a docker-compose.yml with your app container and an Envoy sidecar container. The sidecar should handle all inbound traffic on port 8080 and proxy to your app on port 3000. Write the Envoy envoy.yaml config from scratch.

Envoy config has three key sections: listeners (what port to listen on), clusters (where to forward traffic), and routes (which listener maps to which cluster). In Docker Compose, put both containers on the same network so the sidecar can reach the app at app:3000.

Exercise 5: Build a Simple Proxy in Python Medium

Write a 20-line Python proxy using the http.server and urllib modules. It should listen on port 8080, accept any HTTP request, forward it to http://httpbin.org, and return the response. This teaches you what a proxy actually does at the network level.

Subclass http.server.BaseHTTPRequestHandler. In do_GET, use urllib.request.urlopen() to fetch from the upstream, then write the response back to the client with self.wfile.write().

proxy.py
from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.request import urlopen, Request

UPSTREAM = "http://httpbin.org"

class ProxyHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        url = UPSTREAM + self.path
        resp = urlopen(Request(url, headers={"Host": "httpbin.org"}))
        self.send_response(resp.status)
        for key, val in resp.getheaders():
            self.send_header(key, val)
        self.end_headers()
        self.wfile.write(resp.read())

HTTPServer(("0.0.0.0", 8080), ProxyHandler).serve_forever()

Run it: python proxy.py. Then curl http://localhost:8080/get — you'll get httpbin's response, routed through your proxy. That's a reverse proxy in 15 lines.

Section 14

Cheat Sheet — Quick Reference Cards

Forward vs Reverse
Forward Proxy
  Sits in front of: CLIENTS
  Hides:            Client identity
  Example:          Corporate proxy, VPN
  Client knows:     YES

Reverse Proxy
  Sits in front of: SERVERS
  Hides:            Server identity
  Example:          Nginx, Cloudflare
  Client knows:     NO
Nginx Essential Config
upstream backend {
  server 10.0.0.1:3000;
  server 10.0.0.2:3000;
}
server {
  listen 443 ssl;
  ssl_certificate     /etc/ssl/cert.pem;
  ssl_certificate_key /etc/ssl/key.pem;
  location / {
    proxy_pass http://backend;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP
      $remote_addr;
  }
}
HAProxy Essential Config
frontend http_front
  bind *:80
  default_backend servers

backend servers
  balance roundrobin
  option httpchk GET /health
  server s1 10.0.0.1:3000 check
  server s2 10.0.0.2:3000 check

# Health check: if /health
# returns non-200, server is
# removed from pool automatically.
Load Balancing Algorithms
Round Robin     Equal distribution
               Default in Nginx/HAProxy

Least Conns     Route to server with
               fewest active requests

IP Hash         Same client IP always
               hits same backend

Weighted        Servers get traffic
               proportional to weight

Random          Simple, surprisingly
               effective at scale
Key Headers
X-Forwarded-For
  Client's real IP chain
  "203.0.113.50, 70.41.3.18"

X-Real-IP
  Original client IP only
  "203.0.113.50"

X-Forwarded-Proto
  Original protocol
  "https" (even if backend=HTTP)

Host
  Original domain name
  "app.example.com"
When to Use What
Simple web app
  → Nginx reverse proxy

High-perf TCP proxying
  → HAProxy

API management + auth
  → API Gateway (Kong, AWS)

50+ microservices
  → Service Mesh (Istio)

Global static content
  → CDN (Cloudflare)

All of the above?
  → CDN + Gateway + Mesh
    (they stack, not replace)
Section 15

Connected Topics — Where Proxies Lead Next

Proxies don't exist in isolation — they connect to almost every other concept in system design. Once you understand proxies, these related topics become much easier to learn because you already know the "middleman" mental model.