API & Communication Patterns

Webhooks

Don't call us, we'll call you. The event-driven pattern that powers every payment confirmation, CI/CD pipeline, and real-time integration on the web.

8 Think Firsts 16 SVG Diagrams 15 Sections 4 Exercises 24 Tooltips
Section 1

TL;DR — Don't Call Us, We'll Call You

Mental Model: Polling is calling the pizza shop every 30 seconds asking "Is my pizza ready yet?" Webhooks is giving them your phone number and saying "Call me when it's done." Same result, 99% less wasted effort.

Most APIs work like a question-and-answer conversation: your code asks a server "did anything happen?", the server says "nope", and you ask again a few seconds later. This is called pollingRepeatedly sending requests at regular intervals to check for new data — like refreshing a webpage over and over hoping something changed., and it works, but it's horribly wasteful. You're burning CPU, bandwidth, and API rate limits just to hear "nothing new" over and over again.

A webhook flips this entirely. Instead of your code constantly asking "anything new?", you give the other service a URL and say "POST to this address whenever something interesting happens." The service saves your URL, and when an event fires — a payment succeeds, a commit is pushed, an order ships — it sends an HTTP POST request to your endpoint with a JSON payloadJSON (JavaScript Object Notation) is a lightweight text format for structured data. A "payload" is the data carried inside a request — like the contents of a package. containing all the details. No wasted requests. No delay. Instant notification.

Polling vs Webhooks — The Core Idea YOUR APP (polls every 5 sec) "Anything new?" x 17,280/day STRIPE "No. No. No. No." 172.8M requests/day for 10K users 99.99% return "nothing changed" YOUR APP (just listens) POST /webhook {event data} STRIPE (pushes when ready) 100K requests/day for 10K users 1 request per actual event That's 1,728x fewer requests. Same result.
Try It Right Now — 30 Seconds

Open webhook.site in a new tab. You'll get a unique URL instantly. Now go to any GitHub repo you own: Settings → Webhooks → Add webhook. Paste your webhook.site URL. Push a commit. Within 1 second, webhook.site shows the exact JSON payload GitHub sent you. That's a webhook. You just received one.

One-line takeaway: A webhook is a user-defined HTTP callbackA URL you hand to another service, saying "call this when something happens." It's the web equivalent of giving someone your phone number. — you register a URL, and the provider POSTs to it when an event occurs. Zero wasted requests, near-instant delivery.
Section 2

The Scenario — "Is My Payment Done Yet?"

Think First

You order food on a delivery app. After paying, the app doesn't just freeze — it shows you "Preparing your order..." and eventually "Out for delivery!" How does the app know when the restaurant started cooking? When the driver picked it up? Someone — or something — is telling the app each time a status changes. That's the same problem we're about to solve with webhooks.

You're building an e-commerce checkout. A customer clicks "Pay Now" and your server sends the payment details to StripeStripe is the most popular payment processing API. Companies like Amazon, Google, and Shopify use it. It handles credit cards, fraud detection, and bank transfers so you don't have to.. Here's the thing most beginners don't realize: Stripe does not respond synchronously with "payment succeeded." The payment goes through a pipeline — fraud checks with Stripe Radar, bank authorization via the card network (Visa/Mastercard), possible 3D SecureAn extra verification step (like a bank's SMS code or biometric check) that reduces fraud for online card payments. It's why you sometimes get redirected to your bank's page during checkout. challenges. This takes anywhere from 1 to 30 seconds. For international cards or high-value transactions, even longer.

Your checkout page is sitting there, waiting. The customer is staring at a spinner. Your server got back a 202 Accepted from Stripe — meaning "Got it, working on it" — but that's not a final answer. You need to know: did the payment actually succeed or fail?

What Happens After "Pay Now" — Stripe's Internal Pipeline Your Server POST /v1/payment_intents Stripe API 202 Accepted Stripe Radar Fraud check ~200ms Card Network Bank auth 1-30 sec 3D Secure (if required) +5-15 sec Your Server Waiting... How does your server find out the result? Option A: Keep asking (polling)   |   Option B: Get called (webhook) Every major payment provider (Stripe, PayPal, Square) works this way — async by design.

Two approaches exist. The naive one — polling — is what most beginners try first. The smart one — webhooks — is what every production system actually uses. Let's build the naive version first, watch it fall apart, and then discover why webhooks were invented.

Section 3

First Attempt — "Is It Done Yet?" on Repeat

Think First

You're waiting for a friend to finish baking your birthday cake. The simplest approach: call them every 5 minutes and ask "Is it ready yet?" They'll say "no" six times before finally saying "yes." It works — but you wasted six phone calls, annoyed your friend, and tied up both your phones. That's polling.

The most obvious solution: after sending the payment request, start a loop. Every few seconds, call Stripe's PaymentIntent APIStripe's PaymentIntent endpoint (GET /v1/payment_intents/:id) returns the current status of a payment — "requires_payment_method", "processing", "succeeded", or "canceled". and ask "Is this payment done yet?" When the status finally changes to succeeded or failed, update your database and show the result to the customer.

Here's what that looks like — real Stripe API code you could run today:

payment_poller.py
# The naive approach: keep asking Stripe until we get a final answer
import time
import stripe

stripe.api_key = "sk_test_..."  # Your Stripe secret key

def wait_for_payment(payment_intent_id):
    """Poll Stripe every 5 seconds. Wasteful but simple."""
    attempts = 0
    while True:
        # Each call counts against Stripe's rate limit (100 req/sec in live mode)
        pi = stripe.PaymentIntent.retrieve(payment_intent_id)
        attempts += 1

        if pi.status == "succeeded":
            print(f"Payment done after {attempts} API calls")
            return {"status": "paid", "amount": pi.amount}

        elif pi.status in ("canceled", "requires_payment_method"):
            return {"status": "failed", "reason": pi.last_payment_error}

        # Still processing — wait and ask again
        time.sleep(5)  # 5 seconds × forever = a LOT of wasted requests

# For one customer, this might make 6 calls over 30 seconds. Fine.
# For 10,000 concurrent checkouts? That's 2,000 requests/second to Stripe.
# Stripe's rate limit: 100 req/sec. You'd be blocked in under a minute.

This code is dead simple: ask, wait, ask again. For a single checkout, it's fine — 6 calls over 30 seconds, no big deal. But now multiply it. What happens when 10,000 customers are checking out at the same time? Each one polls every 5 seconds. That's 2,000 requests per second hitting Stripe's API, and 99% of them get the same useless answer: "still processing."

Let's do the full math, because the numbers are what make this click:

The Polling Math — One Day at Scale

Polling every 5 seconds for 10,000 users:
10,000 users × (86,400 seconds/day ÷ 5 seconds) = 172,800,000 requests/day
That's 172.8 million requests. And the vast majority return "nothing changed." Meanwhile, the average user only triggers about 10 actual events per day. With webhooks, that's 10,000 × 10 = 100,000 requests/day. Same information, 1,728× fewer requests.

Polling Timeline — 30 Seconds for One Customer 0s "processing" 5s "processing" 10s "processing" 15s "processing" 20s "processing" 25s "processing" 30s "succeeded" 6 wasted API calls — each one said "still processing" Now multiply: 10,000 customers = 120,000 wasted calls per hour Stripe rate limit: 100 req/sec. You'd hit it in under 20 seconds.

The code works. The math doesn't. At any meaningful scale, polling is a denial-of-service attack on yourself — and on the provider you're polling. Let's see exactly where it breaks.

Section 4

Where It Breaks — Three Walls You'll Hit

Think First

Imagine a hospital where 10,000 patients each press the nurse call button every 5 seconds — not because they need anything, but just to ask "Am I discharged yet?" The nurses can't help actual emergencies because they're buried under pointless "no, not yet" responses. That's what polling does to an API server at scale.

Polling works fine for one user. But three specific problems turn it into a production disaster. Each one comes with real numbers from real APIs.

You'll Burn Through Rate Limits in Minutes

Every major API enforces rate limitsA maximum number of API requests allowed in a time window (e.g., Stripe: 100 reads/sec in live mode, GitHub: 5,000/hour for authenticated requests). Exceeding it returns HTTP 429 "Too Many Requests." — and polling burns through them fast. Here are the actual numbers from real providers:

  • Stripe: 100 read requests/second in live mode. With 10,000 users polling every 5 seconds, you need 2,000 req/sec. That's 20x over the limit. Stripe starts returning 429 Too Many Requests and may temporarily revoke your API key.
  • GitHub: 5,000 requests/hour for authenticated users. Polling every 5 seconds burns through your quota in 25 seconds (5,000 ÷ 200/sec). After that, you can't even fetch commit statuses for your CI/CD.
  • Shopify: 2 requests/second per app. A single polling loop could consume your entire rate limit, leaving nothing for actual store operations.

And here's the worst part: the rate limit you burn on empty polls is the same rate limit you need for real operations. Wasting it on "anything new? no." means you can't create charges, fetch user data, or do any actual work.

Polling has a built-in delay: on average, you discover an event half your polling interval after it happened. Poll every 5 seconds? Average 2.5-second delay before you know. Poll every 1 second to reduce delay? Now you've 5x'd your request volume.

The Unwinnable Tradeoff Poll: 5 sec Avg delay: 2.5 sec 172.8M req/day (10K users) Poll: 1 sec Avg delay: 0.5 sec 864M req/day (10K users) Webhook Avg delay: <1 sec 100K req/day (10K users) WINS BOTH

Webhooks break this tradeoff entirely. Delivery latency is typically under 1 second (GitHub delivers within 10 seconds; Stripe is usually under 5), and the request count is proportional to actual events — not clock ticks.

With polling, your server load is proportional to time × users, regardless of whether anything is happening. At 3 AM when nobody's shopping, your polling loops are still firing 2,000 requests/second — paying for compute and bandwidth to hear "nothing changed" 172 million times a day.

With webhooks, load is proportional to actual events. Zero events at 3 AM? Zero requests. Black Friday surge with 50,000 orders per hour? 50,000 webhook deliveries — exactly the traffic you'd expect, and exactly what you need to handle. The infrastructure costs track the business, not the clock.

Daily Request Volume: 10,000 Users Polling (5s interval): 172,800,000 requests/day 99.94% wasted Webhooks: 100,000 requests/day 1 request per actual event — 0 waste 1,728x fewer requests. Same information delivered.

Polling is a perfectly fine prototype hack. But every production API you'll work with — Stripe, GitHub, Twilio, Shopify, PayPal, SendGrid, Slack — offers webhooks for exactly these reasons. The industry voted with their APIs: push beats pull.

Section 5

The Breakthrough — Reverse the Direction

Think First

You're expecting a package. Instead of walking to the front door every 5 minutes to check, you install a doorbell. The delivery driver presses it when they arrive. You do zero work until the bell rings. Your webhook endpoint is that doorbell — it just sits there, waiting for someone to ring it.

The breakthrough is embarrassingly simple: reverse who calls whom. Instead of your code pulling status from the provider, let the provider push events to your code. You expose a URL — any URL that accepts HTTP POST requests — and tell the provider "send event data here whenever something happens." The provider saves that URL, and from that moment on, every time a relevant event occurs, it fires an HTTP POST to your endpoint with a JSON payloadJSON (JavaScript Object Notation) is a lightweight text format for structured data. The "payload" is the actual data inside the request — like the contents of a delivered package. describing exactly what happened.

This pattern is called a callbackA function (or URL) you hand to someone else, saying "call this when you're done." Same concept in programming — you pass a function to be invoked later. — you give someone your phone number and say "call me when it's ready." In web terms, you give them a URL and say "POST to this when the event fires." The industry calls it a webhook — combining "web" (HTTP) with "hook" (a place where you attach your own code to someone else's system).

How Stripe Webhooks Actually Work

Let's make this concrete with Stripe — the payment provider you'll encounter in most e-commerce jobs. Here's the real setup process, step by step:

Step 1: Register your endpoint. In the Stripe Dashboard, go to Developers → Webhooks → Add endpoint. Enter your server's URL (e.g., https://yourapp.com/webhooks/stripe) and select which events you care about — like payment_intent.succeeded and payment_intent.payment_failed. Stripe gives you a webhook signing secretA unique string (starts with "whsec_") that Stripe uses to cryptographically sign every webhook it sends you. You use this secret to verify the webhook is really from Stripe, not an attacker. starting with whsec_.

Step 2: Wait. Your server does nothing. No polling loops. No timers. It just listens on that endpoint like a doorbell waiting to be rung.

Step 3: Event fires. A customer's payment goes through. Stripe immediately sends an HTTP POST to your registered URL. Here's the actual payload structure Stripe sends — this is not a simplified example, this is what hits your server:

Real Stripe Webhook Payload
POST /webhooks/stripe HTTP/1.1
Host: yourapp.com
Content-Type: application/json
Stripe-Signature: t=1695123456,v1=5257a869e7ecebeda32af...

{
  "id": "evt_1Nh7Ac2eZvKYlo2C",
  "object": "event",
  "type": "payment_intent.succeeded",
  "api_version": "2023-10-16",
  "created": 1695123456,
  "data": {
    "object": {
      "id": "pi_3Nh7Ab2eZvKYlo2C",
      "object": "payment_intent",
      "amount": 2000,
      "currency": "usd",
      "status": "succeeded",
      "customer": "cus_OjN1abc123",
      "payment_method": "pm_1Nh7Ab2eZvKY...",
      "metadata": {
        "order_id": "order_98765"
      }
    }
  },
  "livemode": true,
  "pending_webhooks": 1,
  "request": {
    "id": "req_abc123",
    "idempotency_key": null
  }
}

Notice the Stripe-Signature header — that's how you verify the webhook is legitimately from Stripe and not an attacker spoofing requests. Stripe signs every webhook using HMAC-SHA256A cryptographic algorithm that combines a message with a secret key to produce a unique "fingerprint." If the message or key changes even slightly, the fingerprint is completely different — so you can verify authenticity. with your per-endpoint whsec_ secret. Here's the verification code:

webhook_handler.py
import stripe
from flask import Flask, request, jsonify

app = Flask(__name__)
endpoint_secret = "whsec_..."  # From Stripe Dashboard → Webhooks

@app.route("/webhooks/stripe", methods=["POST"])
def handle_stripe_webhook():
    payload = request.get_data()        # Raw bytes — DON'T parse JSON first
    sig_header = request.headers.get("Stripe-Signature")

    # Step 1: Verify the signature (CRITICAL — never skip this)
    try:
        event = stripe.Webhook.construct_event(
            payload, sig_header, endpoint_secret
        )
    except stripe.error.SignatureVerificationError:
        return "Invalid signature", 400  # Reject forged webhooks

    # Step 2: Handle the event
    if event["type"] == "payment_intent.succeeded":
        payment = event["data"]["object"]
        order_id = payment["metadata"]["order_id"]
        amount = payment["amount"] / 100  # Stripe uses cents
        print(f"Order {order_id} paid: ${amount:.2f}")
        # → Update your database, send confirmation email, etc.

    elif event["type"] == "payment_intent.payment_failed":
        payment = event["data"]["object"]
        error = payment.get("last_payment_error", {}).get("message", "Unknown")
        print(f"Payment failed: {error}")
        # → Notify customer, offer retry

    # Step 3: Return 200 quickly — Stripe times out after 20 seconds
    return jsonify({"received": True}), 200

That's the real code. No polling loop. No timer. Your server sits idle until Stripe rings the doorbell, then handles the event and replies with 200 OK. If you don't return a 2xx status within 20 seconds, Stripe assumes delivery failed and retries — up to 3 times over 3 hours with exponential backoffEach retry waits longer than the previous one — e.g., 1 minute, then 5 minutes, then 30 minutes. This avoids hammering a server that might be temporarily down..

Webhook Lifecycle — Register Once, Receive Forever 1. REGISTER Stripe Dashboard → Webhooks Stripe saves your URL https://you.com/webhooks/stripe + whsec_ signing secret + event filters 2. EVENT FIRES payment_intent.succeeded 3. STRIPE POSTs TO YOU JSON payload + Stripe-Signature header HMAC-SHA256 signed with whsec_ secret 4. VERIFY + PROCESS Check HMAC → update DB → email customer 5. Return HTTP 200 OK Must respond within 20 sec If no 2xx response: Stripe retries 3x over 3 hours Exponential backoff: ~1min → ~5min → ~30min

How Other Providers Do It

Stripe isn't unique — every major API uses the same pattern with its own security twist. Here's how four real webhook providers compare:

Signature: Stripe-Signature header with HMAC-SHA256, using a per-endpoint whsec_ secret.

Retries: 3 attempts over 3 hours with exponential backoff. Events are idempotentThe same event always has the same ID (like "evt_1Nh7Ac..."). If Stripe retries, it sends the same event ID — so your code can check "did I already process this?" and avoid duplicate actions. — same event ID on retries.

Debugging: Dashboard shows delivery attempts with full request/response bodies. CLI tool: stripe listen --forward-to localhost:3000/webhooks for local dev.

Recovery: stripe events list --limit 10 to fetch missed events manually.

Signature: X-Hub-Signature-256 header with HMAC-SHA256.

Delivery speed: Typically within 10 seconds of the event.

Debugging: Settings → Webhooks → Recent Deliveries shows request/response bodies. One-click "Redeliver" button for failed deliveries.

Events: 200+ event types — push, pull_request, issues, deployment, etc.

Signature: X-Twilio-Signature header plus HTTP Basic Auth. Uses a different algorithm (HMAC-SHA1 with your auth token).

Use case: StatusCallback URLs for SMS delivery receipts, call status updates, and recording-ready notifications.

Retries: Configurable per-event. Failed callbacks can trigger fallback URLs.

Signature: X-Shopify-Hmac-Sha256 header with HMAC-SHA256 using your app's API secret.

Enforcement: Mandatory verification — Shopify apps that don't verify webhook signatures can be suspended from the app store.

Events: orders/create, products/update, app/uninstalled, etc.

Try It Locally in 60 Seconds

You don't need a deployed server to receive webhooks. Use ngrok to expose your localhost: run ngrok http 3000 and you'll get a public URL like https://abc123.ngrok.io. Paste that into Stripe/GitHub webhook settings. Even easier: Stripe's CLI has a built-in tunnel — stripe listen --forward-to localhost:3000/webhooks — which auto-registers a test endpoint and forwards events to your local Flask/Express app.

Section 6

How It Works — The Five Building Blocks

A webhook system isn't just "send a POST." In production, five pieces work together to make webhooks reliable. Let's walk through each one.

Event → HTTP POST

When an event happens on the provider's side (a payment succeeds, a pull request is merged, an SMS is received), the provider immediately sends an HTTP POST request to the URL you registered. The body of that POST contains a JSON payloadJSON (JavaScript Object Notation) is a lightweight text format for sending structured data. A 'payload' is the actual data being carried in a request — like the contents of a package. describing what happened.

Incoming Webhook POST
POST /webhooks/stripe HTTP/1.1
Host: your-app.com
Content-Type: application/json
Stripe-Signature: t=1614...,v1=abc123...

{
  "id": "evt_1Nh7Ac2eZvKYlo2C",
  "type": "payment_intent.succeeded",
  "created": 1695123456,
  "data": {
    "object": {
      "id": "pi_3Nh7Ab2eZvKYlo2C",
      "amount": 2000,
      "currency": "usd",
      "status": "succeeded",
      "customer": "cus_OjN1abc123"
    }
  }
}

Notice the structure: there's an event type (what happened), a timestamp (when it happened), and a data object (the full details). Your endpoint receives this and acts on it — update the order status, send a confirmation email, trigger shipping.

Well-designed webhook payloads follow a consistent pattern. Every payload should tell you three things: what happened (event type), when it happened (timestamp), and the details (the changed resource). Some providers send the full resource in the payload; others send only an ID and expect you to fetch the full details — this is called a "thin" vs "fat" payloadA 'fat' payload includes all the data you need right in the webhook body. A 'thin' payload only sends an event type and resource ID — you then call the API to get the details. Fat is faster to process; thin is more secure (less data in transit)..

Fat vs Thin Payloads Fat Payload All data included in the POST ✓ Process immediately, no extra API call ✗ Larger payloads, data could be stale Used by: Stripe, Shopify Best for: speed-critical workflows Thin Payload Just event type + resource ID ✓ Small, secure, always fresh data ✗ Requires extra API call to get details Used by: GitHub (some events) Best for: security-sensitive data

Here's a scary thought: your webhook URL is just a public endpoint. Anyone who knows it could send a fake POST pretending to be Stripe, telling your app "this payment succeeded" when it didn't. You'd ship products to someone who never paid. This is why signature verificationA way to prove a message really came from who it claims to be from. The sender signs the message with a secret key, and you verify it using the same key. If the signature matches, the message is authentic. is absolutely critical.

The solution is called HMACHash-based Message Authentication Code. A cryptographic method where the sender hashes the message body together with a shared secret key. The receiver does the same hash — if they match, the message is authentic and hasn't been tampered with. — Hash-based Message Authentication Code. Here's how it works in plain English:

HMAC Signature Verification PROVIDER payload + secret key → HMAC-SHA256 → signature Sends signature in header POST + sig header YOUR SERVER payload + same secret key → HMAC-SHA256 → your hash Compare with received sig Match? ✓ Authentic No match? ✗ Reject The secret key is shared between you and the provider during setup. An attacker doesn't know the key, so they can't forge a valid signature.
verify_signature.py
import hmac
import hashlib

def verify_webhook(payload_body, signature_header, secret):
    """Verify the webhook really came from the provider."""
    # Compute what the signature SHOULD be
    expected = hmac.new(
        key=secret.encode('utf-8'),
        msg=payload_body,
        digestmod=hashlib.sha256
    ).hexdigest()

    # Compare securely (prevents timing attacks)
    if not hmac.compare_digest(expected, signature_header):
        raise ValueError("Invalid signature — possible forgery!")

    return True  # Signature matches, safe to process
verifySignature.js
const crypto = require('crypto');

function verifyWebhook(payloadBody, signatureHeader, secret) {
  // Compute what the signature SHOULD be
  const expected = crypto
    .createHmac('sha256', secret)
    .update(payloadBody)
    .digest('hex');

  // Compare securely (constant-time comparison)
  const isValid = crypto.timingSafeEqual(
    Buffer.from(expected),
    Buffer.from(signatureHeader)
  );

  if (!isValid) throw new Error('Invalid signature!');
  return true; // Safe to process
}

Networks are unreliable. Your server might be down for maintenance, or it might return a 500 error because of a temporary database outage. If the provider sends a webhook and doesn't get a 200 OK back, it needs to try again. But it can't just retry immediately a thousand times — that would overwhelm your server the moment it comes back up.

The solution is exponential backoffA retry strategy where the wait time doubles after each failure: first retry after 1 minute, then 2 minutes, then 4, then 8, etc. This prevents hammering a server that's already struggling.: wait a little, then wait longer, then even longer. Typically: 1 minute → 5 minutes → 30 minutes → 2 hours → 12 hours. After a certain number of attempts (usually 5-10), the provider gives up and marks the delivery as failed.

Exponential Backoff — Growing Wait Times Fail 0 min 1 min Fail 5 min Fail 30 min Fail 2 hrs Success! Each wait gets longer — prevents overwhelming a recovering server. After max retries, event goes to the dead letter queue.
Always Return 200 Quickly

Your webhook endpoint should return HTTP 200 as fast as possible — ideally within 5 seconds. If you need to do heavy processing (sending emails, updating multiple databases), accept the webhook, put it on a message queueA buffer that holds tasks to be processed later. Your webhook handler quickly drops the event into the queue and returns 200. A background worker picks it up and does the slow work — without making the provider wait., return 200, and process it asynchronously. Otherwise the provider might time out and retry, causing duplicate processing.

What happens when all retry attempts are exhausted and the webhook still can't be delivered? The event goes to a dead letter queueA holding area for messages that couldn't be delivered after all retry attempts. Named after the 'dead letter office' in postal services — where undeliverable mail ends up. Engineers review these to fix the root cause and replay failed events. (DLQ). Think of it like the "return to sender" pile at the post office — letters that couldn't be delivered after multiple attempts.

A dead letter queue is critical because it means no events are silently lost. You can inspect the queue, fix whatever was wrong (maybe your server was misconfigured, or the endpoint URL changed), and replay the failed events. Most providers like Stripe give you a dashboard where you can see failed webhook deliveries and manually retry them.

Event payment.succeeded Retries (5×) All failed ✗ Dead Letter Queue Stored safely for review No data lost! Fix & Replay Manual retry ✓ Events are never silently lost — they wait in the DLQ until you fix the problem.
Section 7

Going Deeper — Production Concerns

Think First

Stripe sends your server a webhook. Your code processes the payment, updates the database, and is about to return 200 OK — but the network drops. Stripe never sees the 200, so it retries and sends the same webhook again. Your code processes the payment again. The customer just got charged twice. How would you prevent this?

The five building blocks get you started, but real production webhook systems need to handle four additional challenges. These are the things that separate a weekend project from a system that handles millions of events without breaking.

Idempotency — Handling Duplicate Deliveries

Here's a tricky situation: the provider sends a webhook, your server processes it and updates the database, but right before it returns 200 OK, the network connection drops. The provider never got the 200, so it assumes delivery failed and retries. Now your server processes the same event again — maybe charging a customer twice or sending two confirmation emails.

The fix is making your webhook handler idempotentAn operation is 'idempotent' if doing it once has the same effect as doing it ten times. Like pressing an elevator button — pressing it twice doesn't call two elevators. Your webhook handler should work the same way: processing the same event twice should produce the same result as processing it once. — processing the same event twice should produce the same result as processing it once. Most providers include an idempotency key (usually the event ID) in each webhook. Store this ID in your database, and before processing any webhook, check: "Have I seen this ID before?" If yes, skip it and return 200.

idempotent_handler.py
def handle_webhook(event):
    event_id = event["id"]  # e.g., "evt_1Nh7Ac2eZvKYlo2C"

    # Check if we've already processed this event
    if db.processed_events.exists(event_id):
        return 200  # Already handled — skip, but return 200!

    # Process the event (update order, send email, etc.)
    process_payment_event(event)

    # Record that we've processed this event
    db.processed_events.insert(event_id)

    return 200

Imagine a customer creates an order, then updates the shipping address, then cancels it. Three events, in that order. But webhooks might arrive as: cancel → create → update. Why? Because each webhook is an independent HTTP request. Network conditions, retry timing, and provider-side queuing can all scramble the order.

Your handler needs to handle out-of-order delivery gracefully. Common strategies:

  • Timestamps: Each event has a timestamp. Before processing, check if you already have a newer version of this resource. If so, ignore the older event.
  • Version numbers: Some providers include a sequence number. Only process events with a higher sequence than what you've stored.
  • Fetch latest state: Instead of trusting the payload, use it as a signal and always fetch the current state from the provider's API. This guarantees you have the freshest data regardless of delivery order.

HMAC signatures verify the sender, but there are more security layers to add:

  • HTTPS only: Never expose a webhook endpoint over plain HTTP. The payload and signature would be visible to anyone on the network.
  • IP allowlisting: Some providers publish the IP addresses they send webhooks from. You can configure your firewall to reject requests from any other IP. Stripe, for example, publishes their webhook IPs.
  • Timestamp validation: Check that the event timestamp is recent (within the last 5 minutes). This prevents replay attacksAn attacker captures a legitimate webhook and re-sends it hours or days later. If your handler just checks the signature (which is still valid), it would process the stale event. Checking the timestamp prevents this. — where an attacker captures a valid webhook and re-sends it days later.
  • Secret rotation: Periodically generate new signing secrets. Good providers let you have two active secrets during rotation so you don't miss events during the switch.

Most providers give your endpoint 5-30 seconds to respond with a 200. If your handler takes longer (maybe it's writing to a slow database or calling another API), the provider considers it a failure and retries. This creates a nasty loop: slow processing → timeout → retry → more slow processing → more timeouts.

The golden rule: accept fast, process later. Your webhook endpoint should do exactly three things: (1) verify the signature, (2) drop the event onto an internal queue, (3) return 200 OK. A background worker then picks up the event from the queue and does the real processing — updating databases, sending emails, calling other APIs. This decoupling means your endpoint always responds in milliseconds, regardless of how complex the processing is.

Webhook POST arrives ① Verify sig ~1ms ② Enqueue ~5ms ③ Return 200 ~1ms total Background worker processes later Total: < 10ms response Provider is happy ✓
Section 8

Variations — Webhooks vs Polling vs SSE vs WebSockets

Webhooks aren't the only way to push data. Let's compare the four main patterns for getting real-time updates, so you know when to pick each one.

Think First

Think of four ways to know when dinner's ready: (1) walk to the kitchen every 5 minutes and check (polling), (2) give the chef your phone number so they text you (webhook), (3) leave a baby monitor in the kitchen so you can hear them say "it's ready" (SSE), (4) get on a walkie-talkie where you and the chef can both talk anytime (WebSocket).

Feature Polling Webhooks SSE WebSockets
Direction Client → Server Server → Client (HTTP POST) Server → Client (stream) Bidirectional
Connection New per request New per event Long-lived, one-way Long-lived, two-way
Latency Half of poll interval Near-instant Near-instant Near-instant
Complexity Low Medium Medium High
Best for Prototyping, low-frequency Server-to-server events Live feeds (stock prices) Chat, gaming, collaboration
Requires public URL? No Yes — your endpoint must be reachable No No
Built-in retry? You manage it Provider retries automatically Browser auto-reconnects You manage it
Quick Decision Guide Server-to-server event? Yes → Use Webhooks No (browser client) Need two-way comms? Yes → Use WebSockets No → Use SSE
Key distinction: Webhooks are server-to-server — your backend receives events from another backend. SSE and WebSockets are server-to-browser — they push data to a user's browser in real time. If you're integrating with Stripe, GitHub, or Twilio, webhooks are the right choice. If you're building a live chat UI, look at WebSockets.
Section 9

At Scale — How Stripe, GitHub & Twilio Do It

Think First

An attacker discovers your webhook URL (https://yourapp.com/webhooks/stripe). They start sending fake POST requests with crafted JSON payloads that look like real Stripe events — {"type": "charge.succeeded", "amount": 99900}. Your server happily processes them and ships $999 worth of products. How would you verify that a webhook actually came from Stripe and not from an attacker?

Webhooks aren't just a learning exercise — they're the backbone of almost every integration you use. Let's look at how three major platforms implement them, so you can see real-world patterns in action.

Stripe — Payment Events

Stripe sends webhooks for over 200 event types — payment_intent.succeeded, customer.subscription.deleted, invoice.payment_failed, and more. Their webhook system is one of the most mature in the industry:

  • Signature: HMAC-SHA256 with the Stripe-Signature header. Includes a timestamp to prevent replay attacks.
  • Retries: Up to 3 days, with exponential backoff. Starts at ~1 hour between retries.
  • Dashboard: You can see every delivery attempt, inspect payloads, and manually retry failed events.
  • Testing: The stripe listen CLI command forwards live events to your local machine during development — no need to deploy to test webhooks.

Stripe processes billions of webhook deliveries per month. Their advice to developers: always verify signatures, always handle events idempotently, and always return 200 within 20 seconds.

GitHub webhooks trigger on repository activity: pushes, pull requests, issues, releases, workflow runs. This is what powers CI/CD pipelines — when you push code, GitHub sends a webhook to Jenkins/CircleCI/GitHub Actions telling it to run your build.

  • Signature: HMAC-SHA256 in the X-Hub-Signature-256 header.
  • Retries: Failed deliveries are retried for up to 3 days.
  • Events: You choose which events to subscribe to (push, pull_request, issues, etc.) — you don't get everything.
  • Ping event: When you first set up a webhook, GitHub sends a ping event to verify your endpoint is reachable.

Twilio uses webhooks to notify your app about incoming SMS messages, call status changes, and delivery receipts. When someone texts your Twilio number, Twilio sends a webhook to your URL with the message body, sender, and phone number.

  • Signature: Uses an authentication token and request URL to generate a signature in the X-Twilio-Signature header.
  • Unique twist: Twilio webhooks can return TwiMLTwilio Markup Language — an XML format that tells Twilio what to do in response to a webhook. Your endpoint can return TwiML saying 'play this audio' or 'forward this call to another number.' in the response body to control the call/message flow. Your webhook isn't just receiving data — it's sending instructions back.
  • Fallback URL: You can configure a backup URL. If your primary webhook fails, Twilio automatically tries the fallback before giving up.
Production Webhook Patterns — What They All Share Every Major Provider Does These 5 Things ① HMAC signature verification ② Exponential backoff retries ③ Event type filtering ④ Delivery logs / dashboard ⑤ Idempotency keys in payloads These patterns are universal — learn them once, use them everywhere. Stripe GitHub Twilio
Section 10

Anti-Lesson — What Goes Wrong Without Webhooks

Let's look at a real-world scenario that goes catastrophically wrong when you rely on polling instead of webhooks.

The Black Friday Incident

An e-commerce startup launches a flash sale. Their payment integration uses polling — every customer's checkout flow polls Stripe every 3 seconds to check payment status. Traffic spikes to 50,000 concurrent checkouts. That's 16,667 poll requests per second hitting Stripe's API. Stripe rate-limits them. Now nobody can check out — not just slow, completely blocked. The sale fails. Revenue lost. Customer trust destroyed.

With webhooks, the same 50,000 checkouts would generate exactly 50,000 webhook deliveries (one per payment result) — spread naturally over minutes, with Stripe handling the delivery timing. No rate limit issues. No wasted requests. The architecture scales effortlessly because the work is proportional to actual events, not to the number of impatient clients checking repeatedly.

50,000 Checkouts: Polling vs Webhooks Polling: ~16,700 req/sec → Rate limited → Checkout broken for everyone Webhooks: ~50,000 total deliveries → Spread over minutes → Everything works fine 💥 $2M revenue lost CTO fired, investors angry ✓ Record sales day Team celebrates, investors happy
The lesson: Polling multiplies traffic by the number of clients checking. Webhooks keep traffic proportional to the number of actual events. Under spiky load, that difference is the difference between success and catastrophic failure.
Section 11

Common Mistakes — Webhook Pitfalls

Even experienced developers make these mistakes when building webhook consumers. Each one can cause data loss, security holes, or mysterious bugs in production.

The mistake: Accepting every POST to your webhook URL without checking the HMAC signature.

Why it's bad: Anyone who discovers your webhook URL can send fake events. An attacker could fake a "payment succeeded" event and get free products.

The fix: Always compute the HMAC using the shared secret and compare it to the signature header before processing any event. Reject requests with missing or invalid signatures immediately with HTTP 401.

The mistake: Your webhook handler calls external APIs, sends emails, writes to multiple databases — all before returning the HTTP response.

Why it's bad: The provider times out waiting for your 200. It retries. Now you're processing the same event multiple times, sending duplicate emails, and creating duplicate database records.

The fix: Verify the signature, write the event to a queue, return 200. Do all the heavy processing in a background worker. This pattern is sometimes called "fire and forget"The webhook handler 'fires' the event into a queue and 'forgets' about it immediately, returning 200. The background worker 'remembers' it and processes it at its own pace..

The mistake: Assuming each webhook will be delivered exactly once and processing every incoming event without checking for duplicates.

Why it's bad: Webhooks are delivered at least once, not exactly once. Network glitches, provider retries, and your own intermittent failures mean you will receive duplicates. Charging a customer twice because of a duplicate webhook is a very bad day.

The fix: Store the event ID after processing. Before processing any event, check if that ID already exists. If it does, return 200 immediately and do nothing.

The mistake: Registering an http:// webhook URL instead of https://.

Why it's bad: Without TLS encryption, the entire webhook payload — including customer data, payment amounts, and the HMAC signature — travels in plain text. Any network observer can read it, and an attacker could perform a man-in-the-middle attack.

The fix: Always use HTTPS. Most providers (Stripe, GitHub) refuse to send webhooks to HTTP URLs entirely. If you're testing locally, use a tool like ngrok that provides a temporary HTTPS tunnel.

The mistake: Writing handler logic that assumes order.created always arrives before order.updated, which always arrives before order.cancelled.

Why it's bad: Network conditions and retry timing can scramble delivery order. If order.cancelled arrives before order.created, your handler might try to cancel a non-existent order, then create it — leaving a ghost order in your database.

The fix: Check timestamps or version numbers. If the incoming event is older than what you've already processed for that resource, discard it. Or use the "fetch latest" strategy — treat the webhook as a notification to fetch the current state from the API.

Section 12

Interview Playbook — Webhook Questions

Think First

Webhooks come up in system design interviews whenever the design involves third-party integrations, async events, or notification systems. The interviewer wants to see that you understand push vs pull tradeoffs, reliability concerns, and security.

"How would you handle payment notifications in your e-commerce design?"

Strong answer: "I'd register a webhook URL with the payment provider. When a payment succeeds or fails, the provider POSTs the event to my endpoint. My handler verifies the HMAC signature, checks idempotency (have I seen this event ID before?), drops the event onto an internal queue, and returns 200 OK immediately. A background worker picks it up and updates the order status, sends confirmation emails, and triggers fulfillment. This gives me near-instant notification with zero polling overhead. For reliability, I'd also run a daily reconciliation job that compares my records with the provider's — that catches any events that fell through the cracks."

Webhook POST from Stripe Verify HMAC + dedup check Enqueue SQS/RabbitMQ 200 OK < 10ms Worker: update DB, email, fulfill Daily Reconciliation safety net for missed events

Strong answer: "Webhooks are inherently a push model — if the provider can't push, you won't receive events during the outage. That's why production systems should never rely 100% on webhooks alone. I'd combine webhooks with a periodic reconciliation jobA scheduled process (e.g., running every hour or daily) that pulls records from the provider's API and compares them with your database to catch any discrepancies or missed events. — say, every hour, my system fetches recent transactions from the provider's API and compares them with what I've received via webhooks. Any missing events get processed. This 'belt and suspenders' approach means neither a webhook failure nor a provider outage causes permanent data loss."

Strong answer: "Three strategies: (1) Use the provider's CLI tool (like stripe listen --forward-to localhost:4242/webhook) which tunnels real events to my local machine. (2) Use ngrok to create a temporary public URL that points to my local server. (3) For automated tests, I mock the webhook request by constructing the payload and computing the HMAC signature with my test secret. I always test the signature verification path — both valid and invalid signatures — to make sure my security logic is correct."

Strong answer: "Webhooks are one implementation of event-driven architecture that works across organizational boundaries — company A notifies company B via HTTP. Event-driven architecture is the broader pattern: when something happens, publish an event, and interested parties react. Inside a single system, you'd typically use a message broker like Kafka or RabbitMQ rather than HTTP webhooks. Webhooks are the HTTP-based, cross-boundary version of event-driven communication — they're simpler to set up (just a URL) but less reliable and performant than internal message queues."

Section 13

Exercises — Build Your Own Webhook System

Theory is great, but you only truly understand webhooks once you've built a handler and dealt with the edge cases yourself. Try these exercises in order — each one builds on the previous.

Exercise 1: Basic Webhook Receiver Easy

Build a simple HTTP endpoint that receives a POST request, parses the JSON body, and logs the event type and key data to the console. Test it using curl to send a sample Stripe-like payload.

Use any framework — Flask (Python), Express (Node), or Sinatra (Ruby). The endpoint should accept POST, parse request.body as JSON, and extract event.type and event.data.

app.py
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/webhook', methods=['POST'])
def webhook():
    event = request.get_json()
    print(f"Event type: {event['type']}")
    print(f"Event data: {event['data']}")
    return jsonify({"received": True}), 200

if __name__ == '__main__':
    app.run(port=4242)
Exercise 2: Add HMAC Verification Medium

Extend Exercise 1: add HMAC-SHA256 signature verification. Use a shared secret of "whsec_test123". Reject any request whose signature doesn't match. Test with both valid and forged signatures.

Compute hmac.new(secret, body, sha256).hexdigest() and compare it to the X-Signature header. Use hmac.compare_digest() for constant-time comparison (prevents timing attacks).

Exercise 3: Idempotent Processing Medium

Extend Exercise 2: add idempotency. Store processed event IDs in a set (or database). If you receive an event you've already processed, log "duplicate detected" and return 200 without re-processing. Test by sending the same event ID twice.

Use a Python set() or an in-memory dict. Before processing, check if event["id"] in processed_ids. After processing, add it. For production, use a database table with a unique constraint on event ID.

Exercise 4: Build a Webhook Sender with Retries Hard

Now build the other side: a webhook sender. Write a program that sends a POST to a URL, checks the response code, and retries with exponential backoff (1s, 2s, 4s, 8s, 16s) if it gets anything other than 200. After 5 failures, log "delivery failed — moving to DLQ."

Use a for loop with range(5). On each iteration, time.sleep(2 ** attempt) before retrying. Compute and attach the HMAC signature in the request header. Test by pointing it at a server that randomly returns 500.

Section 14

Quick Reference — Cheat Cards

Bookmark this section. These six cards cover everything you need to remember about webhooks — during development, in interviews, or when debugging a production issue at 3 AM.

Webhook in One Sentence
An HTTP POST sent by a provider
to YOUR URL when an event occurs.
Direction: Provider → You
Trigger: Event-driven (not timer)
Protocol: HTTPS + JSON body
The 5-Step Handler
1. Receive POST
2. Verify HMAC signature
3. Check idempotency (seen ID?)
4. Enqueue for async processing
5. Return 200 OK (< 5 seconds)
HMAC Verification
expected = HMAC-SHA256(
  key   = shared_secret,
  msg   = raw_request_body
)
if expected != header_signature:
  return 401 Unauthorized
Retry Strategy
Provider retries if no 200 OK:
  1st retry: ~1 min
  2nd retry: ~5 min
  3rd retry: ~30 min
  4th retry: ~2 hrs
  5th retry: ~12 hrs
After max: → Dead Letter Queue
Security Checklist
✓ HTTPS only (never HTTP)
✓ HMAC signature verification
✓ Timestamp check (< 5 min old)
✓ IP allowlisting (if available)
✓ Secret rotation (quarterly)
✓ Rate limit your own endpoint
Webhooks vs Polling
Polling:   You ask repeatedly
Webhooks:  They tell you once
Polling:   Wastes 99% of requests
Webhooks:  1 request per event
Polling:   Avg half-interval delay
Webhooks:  Near-instant delivery
Section 15

Connected Topics — Where to Go Next

Webhooks connect to a wide range of system design concepts. Here's how this topic fits into the bigger picture and what to study next.

WEBHOOKS Event-driven HTTP callbacks REST API Design "Webhooks extend REST" Message Queues "Internal event backbone" Real-Time Comms "SSE, WebSockets, long poll" Idempotency "Critical for duplicate safety" Event-Driven Arch "Webhooks are the HTTP flavor" API Security "HMAC, TLS, replay prevention"