TL;DR β The One-Minute Version
Most backend performance problems aren't caused by slow servers, bad code, or too much traffic. They're caused by one bad SQL query. A single missing index can make a query read every single row in a 10-million-row table instead of jumping straight to the 47 rows it needs. That's the difference between 89 milliseconds and 0.1 milliseconds. That's 699 times faster β from changing one line.
This page is the most hands-on page in the entire System Guide. Every section has real SQL you can copy, paste, and run. Real EXPLAIN ANALYZE output. Real slow query logs. Real before/after benchmarks. By the end, you'll know how to take any slow query, diagnose exactly where the time is going, and fix it β often in under 5 minutes.
EXPLAIN ANALYZE output in pull requests that add new queries. At GitHub, a single slow query once took down github.com for 2 hours in 2012 β a sequential scanWhen the database reads every single row in a table from start to finish, checking each one against your WHERE clause. Like reading every page of a phone book to find one name. on the repositories table during a schema migration. At Shopify, their internal "query cop" tool automatically rejects any migration that adds a query without a matching index. SQL tuning isn't an optional skill β it's table stakes.
The Scenario β 50ms to 3 Seconds Overnight
You're on call for an e-commerce platform. It's Tuesday morning. Your P95 latencyThe 95th percentile response time β 95% of requests are faster than this number. If your P95 is 200ms, then 95 out of 100 requests complete in under 200ms. The remaining 5% are slower. has been a comfortable 50ms for months. You grab your coffee, open Grafana, and see this:
The deploy log shows a new feature went out Monday night: an "Order History" page that lets users see all their past orders. The feature works perfectly β it shows the right data, the UI looks great, the tests all pass. But behind that innocent-looking page is this query:
SELECT o.id, o.total, o.created_at, o.status,
oi.product_name, oi.quantity, oi.price
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = $1
ORDER BY o.created_at DESC
LIMIT 20;
Looks completely reasonable, right? A simple join, a WHERE clause, a LIMIT. Every junior developer would write exactly this. And it runs fine in development with 500 rows. But production has 10 million orders and 45 million order items. Without an index on orders.user_id, the database has to read every single row in the orders table to find the ones belonging to user 42. That's a sequential scanAlso called a "full table scan." The database reads the table from the first page to the last page, checking every row against your WHERE condition. On a 10-million-row table, this means reading ~6,300 disk pages. Even with caching, it's orders of magnitude slower than an index scan. on 10 million rows. Every. Single. Request.
You have a table with 10 million rows. Your query needs 47 of them. Without an index, the database reads all 10 million. What percentage of the work is wasted? And if you added an index, roughly how many disk pages would the database need to read instead of 6,334?
For the first part: 47 / 10,000,000 = ? For the second: think about how a B-tree index works. It's a tree with ~3 levels. You traverse 3-4 tree nodes, then read the matching data pages. 47 rows across ~8 KB pages...The answer: 99.9995% of the work is wasted. The database reads 10 million rows to return 47. And with a B-tree index, it would read about 3 index pages (to traverse the tree) plus ~47 data pages β roughly 50 page reads instead of 6,334. That's what we'll fix in this page.
What Made This Hard to Catch
This query passed every code review, every test, every staging environment check. Why? Because the staging database had 500 orders. At 500 rows, a sequential scan takes 0.05ms β perfectly fast. The problem only appears when the table has millions of rows. This is the fundamental trap of SQL performance: queries that are correct and fast on small data can be catastrophically slow on production data.
docker run -e POSTGRES_PASSWORD=test -p 5432:5432 postgres:16), create a table with generate_series(1, 10000000), and run the same query with and without an index. You'll see the exact same spike. We'll do exactly this in Section 5.
The First Attempt β Throw More Servers at It
Your API is slow. Your instinct β and the instinct of every team under pressure β is to scale horizontally. Add more app servers. Bump the instance count from 4 to 8. Double the capacity, halve the problem, right?
So you spin up four more API servers behind your load balancer. Each server now handles half the requests it used to. Your CPU utilizationThe percentage of processing power being used. If your app servers show 30% CPU, the bottleneck isn't the app server β it's somewhere else (usually the database). on the app servers drops from 80% to 40%. You check the dashboard. Response times are... still 3 seconds.
Why didn't it work? Because the app servers weren't the bottleneck. They were sitting at 30% CPU, happily waiting for the database to respond. The database was the bottleneck. Every one of those 200 requests per second triggers the same sequential scan on the same 10-million-row table. Adding more app servers just means more connections all hammering the same slow database.
This is one of the most common mistakes in backend engineering: scaling the wrong layer. The app servers are fine. The network is fine. The load balancer is fine. The problem is a single SQL query doing a full table scan, and no amount of horizontal scaling on the app tier will fix it.
If adding more app servers doesn't help (because the DB is the bottleneck), what about adding read replicas? You set up 3 read replicas and route all SELECT queries to them. Does this fix the problem? Think about what each replica has to do when it receives the same query.
Each replica has the same data, same table, same missing index. What does each replica do when it gets SELECT * FROM orders WHERE user_id = 42?Read replicas don't fix it either β not really. Each replica has the exact same table without an index. Each one still does a sequential scan on 10 million rows. You've spread the load across 3 databases instead of 1, so each handles ~67 requests per second instead of 200. That buys you some breathing room, but each request still takes 89ms of pure scan time. You've spent $3,000/month on three new database instances to treat a symptom instead of fixing the cause.
The Real Numbers
Let's look at exactly what you're paying for the "throw hardware at it" approach versus fixing the actual query:
| Approach | Monthly Cost | P95 Latency | Time to Implement |
|---|---|---|---|
| Do nothing (1 DB) | $350/month | 3,100ms | 0 minutes |
| Add 3 read replicas | $1,400/month | 890ms | 2 hours setup |
| Scale app tier 4β8 | $600/month (extra) | 3,100ms (unchanged) | 20 minutes |
| CREATE INDEX | $0 extra | 0.128ms | 30 seconds |
One line of SQL β CREATE INDEX idx_orders_user_id ON orders (user_id); β does more than $2,000/month of additional hardware. This is the entire argument for SQL tuning in one table: fix the query, not the infrastructure.
EXPLAIN ANALYZE on your slowest queries. Nine times out of ten, the fix is a missing index, not a missing server. The next section shows you exactly what was happening under the hood and why it broke.
Where It Breaks β The Four Horsemen of Slow Queries
The "add servers" approach failed because the problem was never the servers. It was the database. Specifically, it was one (or more) of four problems that cause 95% of all database performance issues. Let's meet all four β because you'll encounter every single one of them in your career.
Your "order history" page loads 20 orders. For each order, the code runs a separate query to fetch order items: SELECT * FROM order_items WHERE order_id = ?. That's 1 query for orders + 20 queries for items = 21 queries per page load. If 200 users hit this page per second, how many queries per second hit the database? And what would happen if you replaced those 21 queries with a single JOIN?
Horseman #1: Sequential Scan on a Big Table
We've already met this one. When you write WHERE user_id = 42 and there's no index on user_id, the database has no choice but to read the entire table. It starts at page 1 and reads every page until the end. On a 10-million-row table, that's thousands of disk pages.
Here's what that looks like in real EXPLAIN ANALYZE output β the exact output PostgreSQL gives you:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 42;
-- Output:
-- Seq Scan on orders (cost=0.00..18334.00 rows=50 width=96)
-- (actual time=0.031..89.442 rows=47 loops=1)
-- Filter: (user_id = 42)
-- Rows Removed by Filter: 999953
-- Buffers: shared hit=6334
-- Planning Time: 0.089 ms
-- Execution Time: 89.521 ms
Let's read that output like a doctor reads a blood test. Every line tells you something:
| Line | What it means | Is it bad? |
|---|---|---|
Seq Scan on orders | Full table scan β reading every row, start to finish | Yes. This is the problem. |
rows=50 (estimated) vs rows=47 (actual) | Planner guessed ~50 rows, found 47. Estimate is good. | No, estimates are close. |
Rows Removed by Filter: 999953 | Read 1,000,000 rows, kept 47, threw away 999,953 | Yes. 99.995% waste. |
Buffers: shared hit=6334 | Read 6,334 pages from the buffer poolPostgreSQL's in-memory cache of disk pages. "shared hit" means the page was already in RAM. "shared read" means it had to go to disk β much slower. (RAM cache) | Yes. Way too many pages. |
Execution Time: 89.521 ms | Wall clock time for the query | Yes. 89ms per request = trouble. |
The smoking gun is Rows Removed by Filter: 999953. The database read a million rows and threw away 999,953 of them. That's like reading an entire phone book to find one person's number.
Horseman #2: The N+1 Query Problem
This one is sneaky because it doesn't show up as one slow query. It shows up as a thousand fast queries where you only needed two.
Here's the classic example. You want to show a list of users and how many orders each one has. The "obvious" code does this:
# BAD: N+1 β fires 1001 queries for 1000 users
users = User.all # Query 1: SELECT * FROM users
users.each do |user|
puts user.orders.count # Query 2..1001: SELECT COUNT(*)
end # FROM orders WHERE user_id = ?
# Total: 1 + 1000 = 1001 queries.
# At 1ms per query = 1001ms = over 1 SECOND just for DB calls.
# FIXED: eager loading β 2 queries total
users = User.includes(:orders).all # Query 1: SELECT * FROM users
# Query 2: SELECT * FROM orders
# WHERE user_id IN (1,2,3...)
users.each do |user|
puts user.orders.size # No query β already loaded
end
# Total: 2 queries. ~3ms.
# BAD: N+1
users = User.objects.all() # 1 query
for user in users:
print(user.order_set.count()) # N queries
# FIXED: prefetch_related
users = User.objects.prefetch_related('order_set').all() # 2 queries
for user in users:
print(len(user.order_set.all())) # No query β already loaded
-- BAD: N+1 (what the ORM generates)
SELECT * FROM users; -- query 1
SELECT COUNT(*) FROM orders WHERE user_id = 1; -- query 2
SELECT COUNT(*) FROM orders WHERE user_id = 2; -- query 3
SELECT COUNT(*) FROM orders WHERE user_id = 3; -- query 4
-- ... 997 more queries ...
-- FIXED: One query with JOIN
SELECT u.id, u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;
The N+1 problem is the most common performance bug in web applications using ORMsObject-Relational Mappers β libraries like ActiveRecord (Rails), Django ORM, SQLAlchemy (Python), or Entity Framework (.NET) that let you write code instead of SQL. They're convenient but can generate inefficient queries if you're not careful.. It's called "N+1" because you fire 1 query to get N records, then 1 more query for EACH record. For 1,000 users, that's 1,001 database round trips instead of 2.
Horseman #3: Missing or Wrong Indexes
Sometimes you have an index, but the query can't use it. This is more frustrating than having no index at all, because you think you've solved the problem.
The three most common ways your index becomes useless:
-- You created this index:
CREATE INDEX idx_orders_created ON orders (created_at);
-- β BROKEN: Function call on indexed column
-- The index stores raw dates, not YEAR(dates). Can't use the index.
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
-- β
FIXED: Rewrite to use the column directly
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
-- β BROKEN: Implicit type cast
-- user_id is an integer column, but you pass a string.
-- Some databases cast the COLUMN to match, disabling the index.
SELECT * FROM orders WHERE user_id = '42';
-- β
FIXED: Use the correct type
SELECT * FROM orders WHERE user_id = 42;
-- β BROKEN: Wrong column order in composite index
-- Index is (status, created_at) but you query by created_at only.
-- A composite index is like a phone book sorted by (last_name, first_name).
-- You can't look up by first_name alone β the book isn't sorted that way.
CREATE INDEX idx_status_date ON orders (status, created_at);
SELECT * FROM orders WHERE created_at > '2024-01-01'; -- Can't use index!
-- β
FIXED: Create index with the column you actually filter on FIRST
CREATE INDEX idx_date ON orders (created_at);
-- Or: query includes the leading column
SELECT * FROM orders WHERE status = 'shipped' AND created_at > '2024-01-01';
Horseman #4: Connection Exhaustion
Every database has a maximum number of simultaneous connections. PostgreSQL defaults to max_connections = 100. That sounds like a lot until you realize that slow queries hold connections open longer. If each query takes 89ms instead of 0.1ms, connections are occupied 890x longer. Your pool drains instantly.
Here's the real config you'd see in a PostgreSQL setup with PgBouncerA lightweight connection pooler for PostgreSQL. It sits between your app and the database, maintaining a small pool of real DB connections and multiplexing thousands of app connections through them. Essential for any production PostgreSQL deployment.:
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp_production
[pgbouncer]
listen_port = 6432
max_client_conn = 1000 # App servers can open up to 1000 connections
default_pool_size = 25 # But only 25 real DB connections are open
reserve_pool_size = 5 # 5 extra for emergencies
pool_mode = transaction # Return connection to pool after each transaction
# The math: 1000 app connections funnel into 25 DB connections.
# Each DB connection handles ~40 short queries per second.
# 25 Γ 40 = 1000 queries/sec capacity.
# Without PgBouncer: 1000 app connections = 1000 DB connections = PostgreSQL OOM.
Without a connection pooler, if you have 8 app servers each opening 125 connections, that's 1,000 connections to PostgreSQL. Each connection consumes about 10 MB of memory on the database server. That's 10 GB just for connection overhead β before your database stores a single byte of data. PgBouncer funnels those 1,000 app connections through just 25 real database connections, cutting memory usage by 97%.
The Breakthrough β EXPLAIN ANALYZE Changes Everything
All four horsemen have one thing in common: you can't fix what you can't see. You can stare at your SQL all day and it looks fine. The query is correct. It returns the right data. The problem is invisible in the code β it's in the execution plan the database chose.
That's what EXPLAIN ANALYZE gives you. It's like an X-ray for your query. Instead of just running the query and giving you results, the database also tells you exactly how it found those results β which tables it scanned, which indexes it used (or didn't), how many rows it read, and how long each step took.
You have the slow query from Section 2. Before reading further, what would you add to the database to make it fast? And how would you verify that your fix actually worked β that the database is using the fix you created?
You need two things: something to CREATE, and something to ask the database to EXPLAIN.The fix is two commands. One to create the index, and one to verify the database uses it:
-- Step 1: Create the index (takes ~5 seconds on 10M rows)
CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders (user_id);
-- CONCURRENTLY = don't lock the table while building the index.
-- Without it, your table is locked for writes during index creation.
-- Step 2: Verify the fix
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 42;
-- Output (AFTER index):
-- Index Scan using idx_orders_user_id on orders
-- (cost=0.42..52.18 rows=50 width=96)
-- (actual time=0.028..0.089 rows=47 loops=1)
-- Index Cond: (user_id = 42)
-- Buffers: shared hit=50
-- Planning Time: 0.113 ms
-- Execution Time: 0.128 ms
Read those two outputs side by side. Same query. Same data. Same server. The only difference is one CREATE INDEX statement. The planner went from "I guess I'll read everything" to "I know exactly where that data is." From 89.5ms to 0.128ms. From 6,334 page reads to 50.
Try It Yourself β Right Now
This is the most important hands-on exercise on this page. Copy these commands, run them, and see the difference with your own eyes:
# 1. Start PostgreSQL (Docker)
docker run --name pg-tuning -e POSTGRES_PASSWORD=test \
-p 5432:5432 -d postgres:16
# 2. Connect
docker exec -it pg-tuning psql -U postgres
# 3. Create table with 1M rows (takes ~3 seconds)
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
total NUMERIC(10,2),
status TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
INSERT INTO orders (user_id, total, status)
SELECT
(random() * 20000)::int, -- 20K distinct users
(random() * 500 + 5)::numeric(10,2),
(ARRAY['pending','shipped','delivered','cancelled'])[floor(random()*4+1)::int]
FROM generate_series(1, 1000000);
-- 4. Run BEFORE (no index) β note the Seq Scan and time
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 42;
-- 5. Create the index
CREATE INDEX idx_orders_user_id ON orders (user_id);
-- 6. Run AFTER (with index) β note the Index Scan and time
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 42;
-- 7. Compare the two outputs. You'll see:
-- Seq Scan β Index Scan
-- ~80ms β ~0.1ms
-- ~6000 buffers β ~50 buffers
Finding Your Slow Queries in Production
You can't run EXPLAIN ANALYZE on queries you don't know about. In production, you need to find the slow ones first. Here are the real tools:
-- Method 1: Slow query log
-- In postgresql.conf:
-- log_min_duration_statement = 100
-- This logs every query that takes more than 100ms.
-- Check your PostgreSQL log file for the output.
-- Method 2: pg_stat_statements (MUCH better)
-- Enable it first:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Then find your 10 slowest queries RIGHT NOW:
SELECT
query,
calls,
round(mean_exec_time::numeric, 2) AS avg_ms,
round(total_exec_time::numeric, 2) AS total_ms,
rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- This shows you EXACTLY which queries are slow,
-- how often they run, and how much total time they consume.
-- The "total_ms" column is the most important β
-- a query averaging 5ms that runs 1M times
-- is worse than a query averaging 500ms that runs once.
-- Enable slow query log (run as admin):
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.1; -- log queries > 100ms
SET GLOBAL log_queries_not_using_indexes = 'ON';
-- Check it's enabled:
SHOW VARIABLES LIKE 'slow_query%';
-- Find slow queries:
-- The log file is at: /var/lib/mysql/hostname-slow.log
-- Or use mysqldumpslow to summarize:
-- mysqldumpslow -s t /var/lib/mysql/hostname-slow.log
-- MySQL equivalent of pg_stat_statements:
-- Enable Performance Schema (usually on by default):
SELECT
DIGEST_TEXT AS query,
COUNT_STAR AS calls,
ROUND(AVG_TIMER_WAIT / 1000000000, 2) AS avg_ms,
SUM_ROWS_EXAMINED AS rows_scanned,
SUM_ROWS_SENT AS rows_returned
FROM performance_schema.events_statements_summary_by_digest
ORDER BY AVG_TIMER_WAIT DESC
LIMIT 10;
pg_stat_statements). Step 2 β Understand WHY they're slow (EXPLAIN ANALYZE). Step 3 β Fix the root cause (usually: add an index). Step 4 β Verify the fix (EXPLAIN ANALYZE again). This four-step loop is the entire discipline of SQL tuning. Everything else on this page is details.
How It Works β The Complete Toolkit
You now know the four problems and the breakthrough tool (EXPLAIN ANALYZE). This section is the reference manual β six essential topics, each with real commands you can run. Think of these as the six tools in your SQL tuning toolbox.
Tool 1: Reading EXPLAIN ANALYZE Like a Pro
EXPLAIN ANALYZE is the single most important command in database performance. It runs your query AND tells you exactly how the database executed it. The key is knowing which parts to look at.
Here's a more complex query with a real plan β a join across two tables:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, oi.product_name, oi.quantity
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = 42
ORDER BY o.created_at DESC
LIMIT 20;
-- Limit (cost=298.15..298.20 rows=20 width=52)
-- (actual time=0.192..0.198 rows=20 loops=1)
-- -> Sort (cost=298.15..298.28 rows=50 width=52)
-- (actual time=0.190..0.193 rows=20 loops=1)
-- Sort Key: o.created_at DESC
-- Sort Method: top-N heapsort Memory: 27kB
-- -> Nested Loop (cost=0.85..297.05 rows=50 width=52)
-- (actual time=0.038..0.158 rows=47 loops=1)
-- -> Index Scan using idx_orders_user_id on orders o
-- (cost=0.42..52.18 rows=50 width=24)
-- (actual time=0.025..0.055 rows=47 loops=1)
-- Index Cond: (user_id = 42)
-- Buffers: shared hit=50
-- -> Index Scan using idx_order_items_order_id on order_items oi
-- (cost=0.43..4.87 rows=1 width=36)
-- (actual time=0.001..0.002 rows=1 loops=47)
-- Index Cond: (order_id = o.id)
-- Buffers: shared hit=94
-- Planning Time: 0.324 ms
-- Execution Time: 0.241 ms
Read this plan from the inside out, bottom to top. The innermost nodes execute first:
The key numbers to check in every EXPLAIN ANALYZE output:
- Scan type β
Seq Scanon a large table = problem.Index Scan= good. - Rows Removed by Filter β If this is large, you're reading rows you don't need.
- Buffers: shared hit vs shared read β "hit" means RAM cache. "read" means disk. Disk reads are 100x slower.
- loops β If a node says
loops=47, multiply its time by 47 to get the real cost. - actual time vs estimated cost β If these differ wildly, your statisticsPostgreSQL maintains statistics about your data β how many distinct values each column has, the distribution of values, etc. These stats feed the query planner. If they're outdated (e.g., after a big bulk insert), the planner makes bad decisions. Fix with ANALYZE tablename. are stale. Run
ANALYZE tablename;
Tool 2: Scan Types β How the Database Reads Data
When the query planner decides how to find your data, it picks one of several scan strategies. Each has different performance characteristics. Think of these like different ways to find a word in a book:
| Scan Type | Analogy | When Used | Speed |
|---|---|---|---|
| Seq Scan | Reading every page of a book | No usable index, or table is tiny | Slowest on large tables |
| Index Scan | Using the book's index, then flipping to the page | Index exists, few matching rows | Fast for selective queries |
| Bitmap Index ScanA two-step process: first, scan the index to build a bitmap of which pages contain matching rows. Then, read those pages in sequential order. Faster than Index Scan when you're fetching many rows (say 5-20% of the table) because it avoids random I/O. | Marking all matching pages first, then reading them in order | Index exists, moderate number of matching rows (5-20%) | Middle ground |
| Index Only ScanThe best possible scan. The index itself contains ALL the columns the query needs, so the database never touches the table at all. Only possible with "covering indexes" that include all selected columns. | Finding your answer in the index itself without opening the book | All requested columns are in the index | Fastest β skips table entirely |
The planner picks the scan type based on math. If your query matches 0.01% of the table, Index Scan wins. If it matches 60% of the table, Seq Scan is actually faster (because sequential disk reads are much faster than random reads). The planner knows the tipping point β usually around 5-15% of the table.
-- Selective query (few rows) β Index Scan
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
-- Index Scan using idx_orders_user_id on orders
-- (the planner knows user_id=42 matches ~50 rows out of 1M)
-- Non-selective query (many rows) β Seq Scan
EXPLAIN SELECT * FROM orders WHERE status = 'shipped';
-- Seq Scan on orders Filter: (status = 'shipped')
-- (the planner knows 'shipped' matches ~250K rows = 25% of table.
-- Seq Scan is FASTER here because sequential I/O beats random I/O)
-- Index Only Scan β create a covering index:
CREATE INDEX idx_orders_covering ON orders (user_id) INCLUDE (total, created_at);
EXPLAIN SELECT total, created_at FROM orders WHERE user_id = 42;
-- Index Only Scan using idx_orders_covering on orders
-- (everything the query needs is IN the index β no table access!)
Tool 3: Index Strategies β Single, Composite, Partial, Covering
Knowing that indexes make queries fast is step one. Knowing which type of index to create is step two. There are four types you'll use in practice, and choosing the right one makes the difference between a good index and a great one.
Single-column index β the most common. One index on one column. Perfect when your WHERE clause filters on one thing.
-- Simple: index on the column you filter by
CREATE INDEX idx_orders_user_id ON orders (user_id);
-- Now this query uses the index:
SELECT * FROM orders WHERE user_id = 42;
-- But this query CANNOT use it:
SELECT * FROM orders WHERE status = 'shipped';
-- (different column β needs its own index)
Composite index β index on multiple columns. Column order matters. Think of it like a phone book sorted by (last_name, first_name). You can search by last_name alone, or by (last_name AND first_name), but NOT by first_name alone.
-- Index on (user_id, status) β order matters!
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
-- β
Can use index (leading column):
SELECT * FROM orders WHERE user_id = 42;
-- β
Can use index (both columns):
SELECT * FROM orders WHERE user_id = 42 AND status = 'shipped';
-- β CANNOT use index (skips leading column):
SELECT * FROM orders WHERE status = 'shipped';
-- Rule: the "leftmost prefix" rule.
-- Index on (A, B, C) supports:
-- WHERE A = ? β
-- WHERE A = ? AND B = ? β
-- WHERE A = ? AND B = ? AND C = ? β
-- WHERE B = ? β (skips A)
-- WHERE C = ? β (skips A and B)
Partial index β indexes only the rows that match a condition. Smaller, faster, cheaper. Perfect when you only ever query a subset of the data.
-- You only ever query "active" orders. Why index the other 90%?
CREATE INDEX idx_active_orders ON orders (user_id)
WHERE status != 'cancelled' AND status != 'archived';
-- Index size: covers ~10% of the table = 90% smaller!
-- Works for:
SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';
-- Doesn't work for:
SELECT * FROM orders WHERE user_id = 42 AND status = 'cancelled';
-- (cancelled rows aren't in the index)
-- Real-world example: soft-deleted rows
CREATE INDEX idx_non_deleted_users ON users (email)
WHERE deleted_at IS NULL;
-- Most queries filter WHERE deleted_at IS NULL anyway.
-- This index is tiny and blazing fast.
Covering index (PostgreSQL: INCLUDE, MySQL: "covering") β the index contains all columns the query needs, so the database never reads the actual table. This enables Index Only Scan β the fastest possible scan type.
-- Your dashboard shows: user's recent orders with totals
SELECT total, created_at FROM orders
WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10;
-- Regular index β Index Scan (looks up index, then reads table):
CREATE INDEX idx_orders_user_id ON orders (user_id);
-- Covering index β Index Only Scan (never touches table):
CREATE INDEX idx_orders_user_covering
ON orders (user_id, created_at DESC) INCLUDE (total);
-- The INCLUDE columns are stored IN the index but not used for sorting.
-- The database finds user_id=42 in the index, grabs total + created_at
-- from the index itself, and returns. Zero table access.
-- Verify with EXPLAIN:
-- "Index Only Scan using idx_orders_user_covering"
-- (If you see "Index Scan" instead, run VACUUM on the table first.
-- PostgreSQL needs visibility info to be sure the index-only path is safe.)
Tool 4: N+1 Detection and Fix
You already saw the N+1 problem in Section 4. But how do you find N+1 queries in a real application? You can't just read the code β ORMs hide the actual SQL. Here are the tools that catch them:
# Gemfile
gem 'bullet', group: :development
# config/environments/development.rb
config.after_initialize do
Bullet.enable = true
Bullet.alert = true # Browser popup when N+1 detected
Bullet.console = true # Console warning
Bullet.rails_logger = true
end
# Now when you load a page with N+1 queries,
# Bullet pops up a warning:
#
# "N+1 Query detected:
# User => [:orders]
# Add to your query: .includes(:orders)"
# pip install nplusone
# settings.py
INSTALLED_APPS += ['nplusone.ext.django']
MIDDLEWARE += ['nplusone.ext.django.NPlusOneMiddleware']
NPLUSONE_RAISE = True # Raise exception on N+1 (dev only!)
# Output when N+1 detected:
# nplusone.core.exceptions.NPlusOneError:
# Potential N+1 query detected on `User.order_set`
-- N+1 queries show up as a query with VERY high "calls" count
-- relative to other queries.
SELECT query, calls, mean_exec_time, rows
FROM pg_stat_statements
WHERE query LIKE '%orders%'
ORDER BY calls DESC LIMIT 5;
-- If you see:
-- "SELECT COUNT(*) FROM orders WHERE user_id = $1"
-- calls: 847,293
-- mean_exec_time: 0.4ms
--
-- That's 847K calls doing the same parameterized query.
-- Classic N+1. Total time: 847K Γ 0.4ms = 339 seconds of DB time.
-- Fix: one JOIN query, ~2ms total.
Tool 5: Connection Pooling with PgBouncer
Every database connection costs memory β about 10 MB per connection in PostgreSQL. If you have 8 app servers each opening 100 connections, that's 800 connections consuming 8 GB of RAM just for connection overhead. PgBouncerA lightweight connection pooler that sits between your app servers and PostgreSQL. Your app connects to PgBouncer (which accepts thousands of connections cheaply), and PgBouncer maintains a small pool of real connections to PostgreSQL. Think of it as a receptionist who takes calls from 1000 people but only has 25 phone lines to the back office. solves this by multiplexing many app connections through a small pool of real database connections.
You can benchmark PgBouncer's impact with pgbench β PostgreSQL's built-in benchmarking tool:
# Initialize pgbench tables
pgbench -i -s 50 mydb
# Benchmark WITHOUT PgBouncer (direct to PostgreSQL)
pgbench -c 100 -j 4 -T 60 -h localhost -p 5432 mydb
# Result: tps = 1,847 (including connections establishing)
# Many "connection refused" errors at high concurrency
# Benchmark WITH PgBouncer (connect through pooler)
pgbench -c 100 -j 4 -T 60 -h localhost -p 6432 mydb
# Result: tps = 4,213 (including connections establishing)
# Zero connection errors. 2.3x throughput.
Tool 6: Prepared Statements
Every time you send a SQL query, the database does two things: parse + plan the query, then execute it. For simple queries, planning takes 0.1-0.5ms. If you run the same query 10,000 times per second (with different parameter values), that's 1-5 seconds wasted on planning alone.
Prepared statementsA query that's parsed and planned once, then executed many times with different parameter values. The database caches the execution plan, saving the parse+plan overhead on every subsequent call. Most ORMs use them automatically. solve this by parsing and planning the query once, then re-executing the cached plan with different parameters:
-- Without prepared statement: parse + plan EVERY time
SELECT * FROM orders WHERE user_id = 42; -- plan: 0.1ms + exec: 0.1ms
SELECT * FROM orders WHERE user_id = 99; -- plan: 0.1ms + exec: 0.1ms
SELECT * FROM orders WHERE user_id = 7; -- plan: 0.1ms + exec: 0.1ms
-- 10,000 calls = 10,000 Γ 0.1ms planning = 1 second wasted
-- With prepared statement: plan ONCE, execute many
PREPARE get_orders (int) AS
SELECT * FROM orders WHERE user_id = $1;
EXECUTE get_orders(42); -- plan: 0ms (cached) + exec: 0.1ms
EXECUTE get_orders(99); -- plan: 0ms (cached) + exec: 0.1ms
EXECUTE get_orders(7); -- plan: 0ms (cached) + exec: 0.1ms
-- 10,000 calls = 0ms planning + 10,000 Γ 0.1ms = 1 second total
-- Saved: 1 second of pure planning overhead.
-- Most ORMs do this automatically. Check yours:
-- Rails: config.active_record.prepared_statements = true (default)
-- Django: uses server-side prepared statements with psycopg2
-- SQLAlchemy: use_native_hstore=True, use prepared by default
transaction mode (recommended), prepared statements break β because each transaction might use a different backend connection, and the prepared plan is per-connection. Fix: use PgBouncer's prepared_statement_cache_enabled option (available in PgBouncer 1.21+), or switch to session mode for apps that rely heavily on prepared statements.
Going Deeper β Under the Hood
Section 6 gave you the tools. This section explains the engine behind them β the internals that determine why the planner picks one plan over another, why your table gets slower over time even without schema changes, and how to handle tables that outgrow a single machine.
You have a table with 10 million rows and an index on status. 95% of rows have status = 'completed' and 5% have status = 'pending'. You run WHERE status = 'completed'. Will the database use the index or do a sequential scan? What about WHERE status = 'pending'?
The query planner doesn't guess. It calculates a cost for every possible execution strategy and picks the cheapest one. The "cost" is a unitless number β think of it as "estimated effort." Lower cost = faster plan.
The planner's cost formula depends on a few key settings you can actually see and tune:
-- These are the knobs the planner uses to estimate costs:
SHOW seq_page_cost; -- 1.0 (cost of reading one sequential page)
SHOW random_page_cost; -- 4.0 (cost of reading one random page)
SHOW cpu_tuple_cost; -- 0.01 (cost of processing one row)
SHOW cpu_index_tuple_cost; -- 0.005 (cost of processing one index entry)
-- The ratio matters: random_page_cost / seq_page_cost = 4.0
-- This tells the planner: "random I/O is 4x more expensive than sequential I/O."
-- On SSDs, this gap is smaller. Many production setups change it:
SET random_page_cost = 1.1; -- For SSD-backed databases
-- This makes the planner MORE willing to use Index Scan
-- (because random I/O on SSD is almost as fast as sequential).
The planner also uses statistics about your data to estimate how many rows each operation will return. These stats live in pg_stats and are updated by ANALYZE:
-- See what the planner knows about your orders table:
SELECT
attname AS column,
n_distinct, -- How many distinct values (-1 = all unique)
most_common_vals, -- Most frequent values
most_common_freqs, -- How frequent each is
correlation -- How physically ordered the data is (1.0 = perfectly sorted)
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'user_id';
-- If n_distinct is wrong, the planner makes bad row estimates,
-- which leads to bad plan choices (e.g., Seq Scan when Index Scan is better).
-- Force a statistics refresh:
ANALYZE orders;
-- For columns with unusual distributions, increase the sample size:
ALTER TABLE orders ALTER COLUMN user_id SET STATISTICS 1000;
-- Default is 100. Higher = more accurate stats, slower ANALYZE.
SET enable_seqscan = off; temporarily disables Seq Scan, forcing the planner to use an index. Only use this for debugging β never in production code.
PostgreSQL uses a system called MVCCMulti-Version Concurrency Control. Instead of locking rows during updates, PostgreSQL creates a NEW version of the row and marks the old one as "dead." This means readers never block writers. The downside: dead rows pile up and need to be cleaned. for concurrency. When you UPDATE a row, PostgreSQL doesn't modify it in place β it creates a brand new copy and marks the old one as "dead." When you DELETE a row, it's just marked dead, not actually removed.
These dead rows β called bloat β pile up over time. A table with 1 million live rows might actually have 3 million rows on disk if you've been doing lots of updates and deletes. Your sequential scans read all 3 million. Your index scans have to check if rows are "visible" to the current transaction.
-- Check how many dead rows your table has:
SELECT
relname AS table,
n_live_tup AS live_rows,
n_dead_tup AS dead_rows,
round(n_dead_tup::numeric / greatest(n_live_tup, 1) * 100, 1) AS dead_pct,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- If dead_pct > 20%, you have a bloat problem.
-- Manual VACUUM (reclaims dead rows for reuse):
VACUUM orders;
-- VACUUM ANALYZE (also updates planner statistics):
VACUUM ANALYZE orders;
-- Nuclear option β VACUUM FULL (rewrites entire table, locks it):
-- Only use this during maintenance windows!
VACUUM FULL orders;
-- This reclaims disk space. Regular VACUUM just marks dead rows as reusable
-- but doesn't shrink the file on disk.
-- Check autovacuum settings:
SHOW autovacuum_vacuum_threshold; -- default: 50
SHOW autovacuum_vacuum_scale_factor; -- default: 0.2
-- Autovacuum runs when: dead rows > threshold + scale_factor Γ live_rows
-- For a 1M row table: 50 + 0.2 Γ 1M = 200,050 dead rows before autovacuum.
-- For high-write tables, lower the scale factor:
ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.05);
When your table reaches hundreds of millions of rows, even indexed queries slow down because the B-treeThe most common index structure. A balanced tree where each level narrows down the search space. For 10M rows, a B-tree has ~3 levels, so any lookup needs at most 3-4 page reads. For 1 billion rows, it's still only ~4-5 levels. B-trees scale beautifully. gets deeper, VACUUM takes longer, and maintenance operations become painful. The solution: split the table into smaller chunks called partitions.
-- Create a partitioned table (PostgreSQL 10+):
CREATE TABLE orders (
id BIGSERIAL,
user_id INT NOT NULL,
total NUMERIC(10,2),
status TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);
-- Create monthly partitions:
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE orders_2024_02 PARTITION OF orders
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- ... one per month
-- Now when you query:
SELECT * FROM orders WHERE created_at >= '2024-03-01' AND created_at < '2024-04-01';
-- PostgreSQL ONLY scans the orders_2024_03 partition.
-- The other 11 months are never touched. This is called "partition pruning."
-- Verify pruning is working:
EXPLAIN SELECT * FROM orders WHERE created_at >= '2024-03-01' AND created_at < '2024-04-01';
-- Should show: "Append" with only orders_2024_03 listed.
-- If all partitions appear, pruning failed (check your WHERE clause).
Partitioning is most useful when your queries naturally filter on the partition key (usually date). If your queries filter by user_id, you'd partition by user_id ranges or use hash partitioning instead. The key insight: partition on the column you filter by most often.
A CTECommon Table Expression β a named temporary result set defined with the WITH keyword. Looks like: WITH cte_name AS (SELECT ...) SELECT * FROM cte_name. Often used for readability. (Common Table Expression) looks cleaner than a subquery. But in PostgreSQL 11 and earlier, CTEs are optimization barriers β the planner can't push filters from the outer query into the CTE. This was fixed in PostgreSQL 12+ with "CTE inlining," but it's still a gotcha if you're on an older version or using a MATERIALIZED CTE.
-- This CTE materializes ALL orders first, THEN filters.
-- On PG11 or with MATERIALIZED keyword:
WITH all_orders AS MATERIALIZED (
SELECT * FROM orders -- reads ALL 10M rows
)
SELECT * FROM all_orders
WHERE user_id = 42; -- filters AFTER reading everything
-- EXPLAIN shows:
-- CTE Scan on all_orders
-- Filter: (user_id = 42)
-- Rows Removed by Filter: 999953
-- The CTE read the ENTIRE table. The WHERE clause was NOT pushed down.
-- Option 1: Use a subquery
SELECT * FROM (
SELECT * FROM orders
) sub
WHERE user_id = 42;
-- Planner pushes the filter INTO the subquery β Index Scan.
-- Option 2: PG12+ non-materialized CTE (default behavior)
WITH user_orders AS ( -- NOT MATERIALIZED (default in PG12+)
SELECT * FROM orders
)
SELECT * FROM user_orders
WHERE user_id = 42;
-- PG12+ inlines the CTE β same as subquery β Index Scan.
-- Option 3: Just write the simple query
SELECT * FROM orders WHERE user_id = 42;
-- The simplest is almost always the fastest.
EXPLAIN to verify.
Variations β Different Databases, Different Tools
Everything so far has been PostgreSQL-focused. But the principles β EXPLAIN ANALYZE, indexes, N+1 detection, connection pooling β apply to every relational database. The syntax changes. The thinking doesn't. Here are the three most common variations you'll encounter.
In PostgreSQL, EXPLAIN ANALYZE actually runs the query and shows real execution times. In MySQL, EXPLAIN only shows the plan without executing. What's the trade-off? When would you not want to actually execute the query to get its plan?
PostgreSQL vs MySQL EXPLAIN
Both databases have EXPLAIN, but the output format is different. Here's the same concept in both:
-- PostgreSQL: tree-structured output
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 42;
-- Key output:
-- Index Scan using idx_orders_user_id on orders
-- Index Cond: (user_id = 42)
-- Buffers: shared hit=50
-- Execution Time: 0.128 ms
-- Pro tip: use FORMAT JSON for programmatic parsing:
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE user_id = 42;
-- Or paste your plan into: https://explain.dalibo.com/
-- for a beautiful visual breakdown.
-- MySQL: tabular output
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
-- Key columns to read:
-- +----+------+------+---------+------+------+----------+-------+
-- | id | type | key | key_len | ref | rows | filtered | Extra |
-- +----+------+------+---------+------+------+----------+-------+
-- | 1 | ref | idx_user | 4 | const| 47 | 100.00 | NULL |
-- +----+------+------+---------+------+------+----------+-------+
-- type = "ref" means index lookup (good).
-- type = "ALL" means full table scan (bad).
-- key = which index was used (NULL = no index used).
-- rows = estimated rows to examine.
-- filtered = what % of rows pass the WHERE filter.
-- MySQL 8.0+: EXPLAIN ANALYZE (like PostgreSQL):
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;
-- Shows actual execution times, not just estimates.
-- MySQL-specific: EXPLAIN FORMAT=TREE (cleaner than table):
EXPLAIN FORMAT=TREE SELECT * FROM orders WHERE user_id = 42;
| Feature | PostgreSQL | MySQL (InnoDB) |
|---|---|---|
| EXPLAIN with actual timing | EXPLAIN ANALYZE | EXPLAIN ANALYZE (8.0+) |
| Buffer/page stats | BUFFERS option | Not directly available |
| Slow query log | log_min_duration_statement | slow_query_log |
| Query stats extension | pg_stat_statements | performance_schema |
| Partial indexes | Yes (WHERE clause) | No (use generated columns workaround) |
| Covering index syntax | INCLUDE (cols) | Automatic if all cols in index |
| Online index creation | CREATE INDEX CONCURRENTLY | ALTER TABLE ... ADD INDEX (online by default in 5.6+) |
| Connection pooler | PgBouncer | ProxySQL / MySQL Router |
ORM-Generated SQL β Rails, Django, SQLAlchemy
Most web applications don't write raw SQL. They use an ORMObject-Relational Mapper β a library that lets you interact with the database using your programming language's objects instead of raw SQL strings. ActiveRecord (Ruby on Rails), Django ORM (Python), SQLAlchemy (Python), and Entity Framework (.NET) are the most popular ones. that generates SQL behind the scenes. The SQL it generates is usually correct but sometimes inefficient. The key skill is knowing how to see the SQL your ORM is generating and how to fix it when it's slow.
# See the SQL Rails generates:
Order.where(user_id: 42).to_sql
# => "SELECT \"orders\".* FROM \"orders\" WHERE \"orders\".\"user_id\" = 42"
# See EXPLAIN output from Rails:
Order.where(user_id: 42).explain
# => EXPLAIN for: SELECT ...
# Index Scan using idx_orders_user_id ...
# Add index via migration:
class AddUserIdIndexToOrders < ActiveRecord::Migration[7.1]
def change
add_index :orders, :user_id, algorithm: :concurrently
end
end
# algorithm: :concurrently β non-blocking (requires disable_ddl_transaction!)
# Fix N+1 with includes:
User.includes(:orders).where(active: true).each do |user|
user.orders.size # No extra query β already loaded
end
# Fix N+1 with joins (when you need to filter):
User.joins(:orders).where(orders: { status: 'pending' }).distinct
# See the SQL Django generates:
qs = Order.objects.filter(user_id=42)
print(qs.query)
# SELECT ... FROM "orders" WHERE "orders"."user_id" = 42
# See all queries executed in a view (debug toolbar):
# pip install django-debug-toolbar
# Shows every SQL query, timing, and duplicates.
# Add index in the model:
class Order(models.Model):
user_id = models.IntegerField(db_index=True) # single column
# Or composite:
class Meta:
indexes = [
models.Index(fields=['user_id', 'status']),
]
# Fix N+1 with select_related (ForeignKey) or prefetch_related (M2M):
users = User.objects.prefetch_related('order_set').filter(active=True)
for user in users:
len(user.order_set.all()) # No extra query
# Use .only() to select specific columns (avoid SELECT *):
Order.objects.filter(user_id=42).only('id', 'total', 'created_at')
# See the SQL SQLAlchemy generates:
from sqlalchemy import create_engine
engine = create_engine("postgresql://...", echo=True) # echo=True logs all SQL
# Or compile a query object:
stmt = select(Order).where(Order.user_id == 42)
print(stmt.compile(compile_kwargs={"literal_binds": True}))
# Add index:
from sqlalchemy import Index
Index('idx_orders_user_id', Order.user_id)
# Fix N+1 with joinedload or subqueryload:
from sqlalchemy.orm import joinedload
session.query(User).options(
joinedload(User.orders) # JOIN in single query
).filter(User.active == True).all()
# Or selectinload (separate SELECT IN query β often better):
from sqlalchemy.orm import selectinload
session.query(User).options(
selectinload(User.orders) # SELECT * FROM orders WHERE user_id IN (...)
).all()
bullet for Rails, nplusone for Django). And always add indexes through migrations β don't rely on the ORM to create them automatically.
Read Replica Query Routing
Once you've optimized your queries, the next scaling step is splitting read and write traffic. Writes go to the primary database. Reads go to one or more read replicasA copy of your primary database that receives all changes via replication (usually asynchronous). It handles read queries, offloading the primary. The downside: replicas can be slightly behind the primary (replication lag), so you might read slightly stale data.. This is called read/write splitting.
# config/database.yml (Rails 6+):
production:
primary:
adapter: postgresql
host: primary-db.internal
database: myapp
primary_replica:
adapter: postgresql
host: replica-db.internal
database: myapp
replica: true
# Rails automatically routes:
# - GET requests β replica
# - POST/PUT/DELETE β primary
# - After a write, reads stick to primary for 2 seconds
# (to avoid reading stale data from replica lag)
# Manual control:
ActiveRecord::Base.connected_to(role: :reading) do
Order.where(user_id: 42) # Guaranteed to hit replica
end
ActiveRecord::Base.connected_to(role: :writing) do
Order.create!(user_id: 42, total: 99.99) # Guaranteed to hit primary
end
You have a PostgreSQL database with 50M rows in the events table. Your most common query filters by user_id and event_type, and orders by created_at DESC. You only ever query events from the last 90 days. What type of index would you create? Think about: single vs composite, column order, partial index, covering index.
created_at > now() - interval '90 days' would help. Would including created_at in the index help with the ORDER BY?
The ideal index for that scenario: a composite partial index. Something like: CREATE INDEX idx_events_user_type_date ON events (user_id, event_type, created_at DESC) WHERE created_at > now() - interval '90 days'; This gives you: the leftmost prefix for filtering (user_id, event_type), the sort order in the index (created_at DESC β avoiding a separate sort step), and a partial index that only covers recent data (90% smaller than a full index). If you also INCLUDE the columns your query SELECTs, you get an Index Only Scan β the fastest possible path.
At Scale β How Real Companies Tune SQL
Everything we've covered works great on a single database. But what happens when your product has millions of users and billions of rows? Let's look at four companies that deal with massive SQL workloads every single day β and how they keep things fast.
Stack Overflow serves 1.3 billion page views per month with just 2 SQL Server instances. Meanwhile, Shopify uses hundreds of MySQL shards across Vitess. Both are successful. Why would one company need hundreds of databases while another serves a billion views from two? What's fundamentally different about their workloads?
Think about the read/write ratio and whether queries can be cached. Stack Overflow's content changes rarely β the same question page can be cached for minutes. Shopify's checkout flow is unique per merchant, per customer, per cart.Stack Overflow β 1.3 Billion Page Views on 2 SQL Servers
This one blows people's minds. Stack Overflow serves 1.3 billion page views per month β and they do it with just two SQL Server instances (a primary and a replica). No distributed database. No sharding. No microservices. Just fast SQL.
Their secret? Dapper β a micro-ORM that generates lean, predictable SQL. Unlike heavy ORMs that create 17-table JOINs behind your back, Dapper gives you full control over what SQL runs. Their developers write raw parameterized SQL and use Dapper to map results to C# objects. Every query is hand-tuned, every execution plan is reviewed, and they run EXPLAIN equivalent (SQL Server's execution plans) before any query hits production.
GitHub β MySQL + ProxySQL for 10K+ Queries/sec
GitHub runs on MySQL β yes, the same MySQL you can install on your laptop. But at their scale (100+ million repositories, 10,000+ queries per second), a raw MySQL connection isn't enough. The problem: MySQL has a per-connection cost. If you have 200 app servers each opening 50 connections, that's 10,000 connections hitting your database. Most of those connections are idle most of the time, but MySQL still allocates memory for each one.
Their solution: ProxySQL β a lightweight proxy that sits between app servers and MySQL. It pools connections (so 10,000 app connections become ~200 database connections), routes read queries to replicas, and can even cache frequent queries. Think of it as a traffic cop for SQL β it makes sure the database never gets overwhelmed, even when thousands of app servers are talking to it simultaneously.
Shopify β MySQL + Vitess for Millions of Merchants
Shopify powers millions of online stores. Each merchant has orders, products, customers, and inventory β and it all lives in MySQL. But one MySQL instance can't hold data for millions of merchants. Their answer: Vitess, a sharding middleware originally built by YouTube.
Vitess sits between the application and MySQL. It looks like a single database to your application, but behind the scenes it routes queries to the right shard based on the merchant ID. When you run SELECT * FROM orders WHERE shop_id = 12345, Vitess knows that shop 12345 lives on shard 7, and routes the query there β without your application code knowing anything about sharding. The app just writes normal SQL.
The tuning challenge? Each shard still needs optimized queries. A slow query on one shard affects every merchant on that shard. So Shopify runs automated slow-query detection across all shards, flagging any query that crosses a latency threshold. Their internal tooling is essentially a scaled-up version of pg_stat_statements β tracking the top slow queries per shard and alerting on regressions.
Netflix β Cassandra for Scale, PostgreSQL for Money
Netflix is famous for using Cassandra β a NoSQL database designed for massive write throughput. But here's what most people don't know: Netflix uses PostgreSQL for billing. When you're handling credit card charges for 250+ million subscribers, you need ACID transactions, not eventual consistency. You need to know that a charge happened exactly once, not "probably once, eventually."
And they tune PostgreSQL aggressively. Their billing database has custom index strategies, partitioned tables (by month β old billing records get archived), and hand-optimized queries. They run their own pg_stat_statements dashboards, alert on query plan changes (if the planner suddenly switches from an index scan to a sequential scan, that's a red flag), and test every schema migration against production-scale data before deploying.
Anti-Lessons β "Best Practices" That Backfire
Some SQL tuning advice gets repeated so often that people assume it's true. It's not. These three "best practices" sound reasonable but will actively hurt you in production. Learn to spot them so you don't fall for them on your next project (or in your next interview).
This sounds logical β if indexes make reads faster, why not index everything? The problem: every index has a write cost. When you INSERT a row, the database doesn't just write to the table β it updates every index on that table. Ten indexes means ten extra writes per INSERT.
Let's put real numbers on this. A table with 10 million rows and 10 indexes will have roughly 5 GB+ of index storage β sometimes more than the table data itself. Every INSERT now updates 10 B-tree structures. Every UPDATE that touches an indexed column triggers index maintenance. Your writes slow down by 30-60%, your storage doubles, and VACUUM takes three times as long because it has to clean up dead tuples in every index.
WHERE, JOIN, or ORDER BY clauses of your actual queries. Use pg_stat_user_indexes to find unused indexes and drop them. A good rule of thumb: 3-5 well-chosen indexes per table is usually plenty.
During development, SELECT * feels harmless β you get all the columns, you pick what you need in your application code. But in production, that "convenient" asterisk can be devastating.
Imagine your users table has 20 columns. Your API only needs id, name, and email β that's 3 columns. With SELECT *, you're fetching all 20 columns: the bio (a TEXT field averaging 2 KB), the profile_image_url, the settings_json (5 KB blob), and 14 other columns you'll never use. Instead of reading ~100 bytes per row, you're reading ~8 KB per row. That's 80x more data transferred from the database to your application.
It gets worse: with SELECT *, you can't use a covering index. If you had an index on (user_id) INCLUDE (name, email), a query for just those three columns would be an Index Only Scan β the fastest possible path, no table access at all. But SELECT * forces the database to hit the heap (the actual table) because no index covers all 20 columns. You throw away the best optimization PostgreSQL has.
SELECT * with explicit column lists in every query that reaches production. Your ORM probably has a method for this β .select(:id, :name, :email) in Rails, .only('id', 'name', 'email') in Django, .options(load_only(...)) in SQLAlchemy.
ORMs like ActiveRecord, Django ORM, and SQLAlchemy are fantastic for developer productivity. They generate correct SQL. But "correct" and "optimized" are not the same thing. ORMs optimize for developer convenience, not database performance.
Here's what ORMs typically do that hurts performance:
- N+1 queries by default β lazy-loading related records one at a time instead of batching
- SELECT * everywhere β fetching all columns even when the template only displays two
- Unnecessary subqueries β some ORMs generate nested subqueries where a JOIN would be faster
- No index awareness β the ORM has no idea which indexes exist, so it can't generate queries that leverage them
The fix isn't to abandon ORMs β it's to always check the SQL they generate. Every major ORM has a way to log or print the SQL it creates. Make it a habit: write the ORM code, check the SQL, run EXPLAIN, and fix anything that doesn't use an index. The ORM is your co-pilot, not your autopilot.
select_related() and prefetch_related(), it dropped to 4 queries. Same page, same data, same ORM β just one line added. The page went from 2.3 seconds to 180ms.
Common Mistakes β The Six That Hit Every Team
These aren't hypothetical. These are the mistakes that show up in every production postmortem, every performance review, every "why is the database slow" Slack thread. If you can avoid all six, you're already ahead of 90% of backend developers.
This is mistake number one for a reason. Developers write a query, test it on a dev database with 500 rows, see it returns in 2ms, and ship it. On production with 15 million rows, that same query takes 4 seconds because it's doing a sequential scan instead of an index scan.
The fix is dead simple: run EXPLAIN ANALYZE on every new query before it hits production. Not just EXPLAIN (which shows the plan) β EXPLAIN ANALYZE (which actually runs the query and shows real timing). If you see "Seq Scan" on a large table, stop and add an index. This should be as automatic as running tests before merging a PR.
sqlcheck or custom scripts can flag any new SQL migration that doesn't have a corresponding EXPLAIN output reviewed.
Here's a surprise for PostgreSQL users: when you add a foreign key constraint, PostgreSQL does NOT automatically create an index on the foreign key column. MySQL does (InnoDB requires it). But PostgreSQL silently lets you run without one.
So when you have orders.user_id referencing users.id, every JOIN on orders.user_id does a sequential scan unless you manually create an index. Worse: every time you DELETE a user, PostgreSQL has to scan the orders table to check for referencing rows β and without an index, that's a full table scan per delete.
Rule: Every foreign key column needs an index. No exceptions. Check yours right now:
-- Find foreign keys without matching indexes
SELECT conrelid::regclass AS table_name,
a.attname AS fk_column,
confrelid::regclass AS referenced_table
FROM pg_constraint c
JOIN pg_attribute a ON a.attnum = ANY(c.conkey)
AND a.attrelid = c.conrelid
WHERE c.contype = 'f'
AND NOT EXISTS (
SELECT 1 FROM pg_index i
WHERE i.indrelid = c.conrelid
AND a.attnum = ANY(i.indkey)
);
We covered N+1 in Section 6, but it deserves a spot here because it's the single most common performance bug in web applications. The pattern: you load a list of items, then loop through them and load related data one-by-one. 1 query for the list + N queries for the details = N+1.
The trap is that ORMs make it invisible. You write user.orders in a loop and it looks like a property access β but it's actually a SQL query. Every iteration of the loop fires another SELECT * FROM orders WHERE user_id = ?. With 100 users on the page, that's 101 queries instead of 2.
Detection: Use your ORM's query logger. In Rails: ActiveRecord::Base.logger = Logger.new(STDOUT). In Django: django-debug-toolbar. In SQLAlchemy: echo=True. If you see the same query repeated with different IDs, that's an N+1.
Fix: Eager-load the relationship. Every ORM has a method for this β includes (Rails), select_related / prefetch_related (Django), joinedload / subqueryload (SQLAlchemy).
Your database has a maximum number of connections it can handle at once. PostgreSQL defaults to max_connections = 100. If your application has 10 workers, each with a pool of 10 connections, that's already 100 β the entire limit. Add a background job processor and a cron job, and you're over the limit. New connections get rejected. Your application crashes.
The real danger: during a traffic spike, every request opens a connection and holds it while the slow query runs. Connections pile up, the pool fills, new requests queue, timeouts cascade, and suddenly your entire application is down β not because the database is slow, but because it ran out of connections.
Fix: Use an external connection pooler. PgBouncer for PostgreSQL, ProxySQL for MySQL. These sit between your app and the database, multiplexing hundreds of app connections into a small pool of real database connections. PgBouncer in transaction mode can serve 1,000+ app connections through just 20 real database connections.
PostgreSQL uses a system called MVCC (Multi-Version Concurrency Control) β when you UPDATE or DELETE a row, it doesn't actually remove the old version. It marks it as "dead" and creates a new version. This is great for concurrent reads, but it means dead rows accumulate over time, making the table (and its indexes) larger than they need to be.
VACUUM cleans up those dead rows. ANALYZE updates the statistics the query planner uses to pick execution plans. If you don't run them, two things happen: (1) your table bloats β a 2 GB table becomes 8 GB of mostly dead rows, and (2) the query planner makes bad decisions because its statistics are stale.
PostgreSQL has autovacuum built in, and it's on by default. But on high-write tables (anything with thousands of UPDATEs per minute), autovacuum can fall behind. Check its activity:
-- Check when VACUUM and ANALYZE last ran
SELECT relname, n_dead_tup, last_vacuum, last_autovacuum,
last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
Tables bloat, and so do indexes. After millions of UPDATEs and DELETEs, a B-tree index can have large sections of empty or nearly-empty pages. A 500 MB index might only have 200 MB of actual data β the rest is wasted space that the database still has to read through during scans.
Check your index sizes and compare them to what you'd expect:
-- Check index sizes for a specific table
SELECT pg_size_pretty(pg_indexes_size('orders')) AS total_index_size;
-- See individual indexes and their sizes
SELECT indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY pg_relation_size(indexrelid) DESC;
If an index is much larger than expected, rebuild it with REINDEX CONCURRENTLY (PostgreSQL 12+). This rebuilds the index without locking the table. For older versions, use CREATE INDEX CONCURRENTLY with a new name, then drop the old one.
Interview Playbook β "This Query Is Slow. How Do You Debug It?"
This is the most common database question in system design and backend interviews. It sounds simple, but the depth of your answer reveals your experience level instantly. Here's what interviewers expect at each level β with sample answers you can adapt.
What they expect you to know:
- How to find the slow query (slow query log, application logs)
- What EXPLAIN does and how to read the basics (Seq Scan = bad, Index Scan = good)
- How to add an index and verify it worked
- What an N+1 query is and how to fix it with eager loading
Sample answer: "First, I'd identify the exact SQL query that's slow β either from the application logs or the slow query log. Then I'd run EXPLAIN ANALYZE on it to see the execution plan. I'm looking for sequential scans on large tables β that usually means a missing index. If the WHERE clause filters by a column like user_id, I'd add a B-tree index on that column and run EXPLAIN again to confirm it switched to an Index Scan. I'd also check if it's an N+1 problem β if the same query runs hundreds of times with different IDs, I'd fix it with eager loading in the ORM."
What they expect you to know (everything from Junior, plus):
- Query plan analysis β reading nested loops vs hash joins, understanding actual vs estimated rows, spotting the "rows removed by filter" anti-pattern
- Index selection β composite indexes vs single-column, the leftmost prefix rule, partial indexes for time-scoped data, covering indexes with INCLUDE
- pg_stat_statements β how to find the top 10 slowest queries by total time, not just single execution time
- Connection pooling β why you need PgBouncer/ProxySQL and when pool exhaustion causes slowdowns
Sample answer (add to Junior): "Beyond the basic index, I'd look at the full query plan. If I see a Nested Loop with a high row count, maybe a Hash Join would be better β sometimes adding SET enable_nestloop = off in a test session reveals a faster plan. For composite queries, I'd consider a multi-column index with the right column order β most selective column first. I'd also check pg_stat_statements to see if this query is a one-off or if it's the top consumer of total database time. Sometimes the slowest single query isn't the biggest problem β it's the medium-slow query that runs 10,000 times per minute."
What they expect you to know (everything from Mid, plus):
- Query plan analysis at depth β understanding parallel query execution, bitmap heap scans vs index scans, cost model parameters (
random_page_cost,effective_cache_size), and when the planner makes wrong choices - Partitioning strategy β range partitioning for time-series data, hash partitioning for even distribution, partition pruning in query plans
- Read replica routing β directing read queries to replicas, handling replication lag, write-after-read consistency
- Caching layer β when to put a Redis/Memcached layer in front of the database, cache invalidation strategies, read-through vs write-through
- Schema-level optimizations β denormalization trade-offs, materialized views for complex aggregations, pre-computed rollup tables
Sample answer (add to Mid): "At senior scale, I'd take a broader view. First: is this query even hitting the right database? Read queries should go to replicas. If the query is an aggregation (dashboard, report), maybe it shouldn't hit the OLTP database at all β a materialized view refreshed every 5 minutes might be the right answer. For tables over 100M rows, I'd evaluate partitioning β range partition by month for time-series data so the planner only scans the relevant partitions. I'd also look at the caching layer β can we cache the result in Redis with a 60-second TTL and avoid the database entirely for repeated queries? And for the long term, I'd set up pg_stat_statements dashboards with alerting on p95 query latency regression, so we catch slow queries before users do."
Exercises β Hands-On Practice
Reading about SQL tuning is useful, but running real queries is what makes it stick. These five exercises go from "run one command" to "design an indexing strategy for a real schema." Try each one before opening the hints.
Spin up a PostgreSQL instance (Docker: docker run -e POSTGRES_PASSWORD=test -p 5432:5432 postgres:16). Create a table with 1 million rows using generate_series. Run a SELECT with a WHERE clause, then run EXPLAIN ANALYZE on it. Identify: Is it a Seq Scan or Index Scan? What's the execution time? How many rows were scanned vs returned?
Create the table: CREATE TABLE test_orders AS SELECT g AS id, (random()*10000)::int AS user_id, now() - (random()*365)::int * interval '1 day' AS created_at FROM generate_series(1, 1000000) g;
Query: EXPLAIN ANALYZE SELECT * FROM test_orders WHERE user_id = 42;
Look for: "Seq Scan" (no index yet), "actual rows" vs "rows planned", and "Execution Time" at the bottom.
Without an index, you'll see Seq Scan on test_orders with ~1,000,000 rows scanned. Add the index: CREATE INDEX idx_test_user_id ON test_orders(user_id); Run EXPLAIN ANALYZE again β you should see Index Scan using idx_test_user_id with ~100 rows scanned and execution time dropping from ~150ms to ~0.5ms.
In your web application (or a sample app), enable query logging. Load a page that displays a list with related data (e.g., users with their orders). Count the queries. If you see the same SELECT repeated with different IDs, that's an N+1. Fix it using eager loading and compare the query count before and after.
Rails: ActiveRecord::Base.logger = Logger.new(STDOUT) then look for repeated queries. Fix with .includes(:orders).
Django: Install django-debug-toolbar and check the SQL panel. Fix with select_related() for FK relationships or prefetch_related() for reverse/M2M.
SQLAlchemy: Set echo=True on the engine. Fix with joinedload() or subqueryload().
Enable the pg_stat_statements extension in your PostgreSQL instance. Run a mix of queries (some fast, some slow). Then query pg_stat_statements to find the top 5 queries by total execution time. Which query consumes the most total time? Is it slow per-call or just called a lot?
Enable: add shared_preload_libraries = 'pg_stat_statements' to postgresql.conf and restart. Then: CREATE EXTENSION pg_stat_statements;
Query: SELECT query, calls, total_exec_time, mean_exec_time, rows FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 5;
A query with high total_exec_time but low mean_exec_time runs fast but is called thousands of times β optimization here has big cumulative impact.
Use pgbench (ships with PostgreSQL) to benchmark a query before and after adding an index. Initialize a test database with pgbench -i -s 100 (10 million rows). Write a custom SQL file with a parameterized SELECT, run it with pgbench -f query.sql -c 10 -T 30, record the TPS. Add an index, run again, compare.
Custom file (query.sql): SELECT * FROM pgbench_accounts WHERE aid = :aid \gset. The :aid is randomized by pgbench.
Before: DROP INDEX IF EXISTS pgbench_accounts_pkey; (careful β restore after!)
Run: pgbench -f query.sql -c 10 -T 30 -r mydb. The -r flag shows per-statement latency.
You should see TPS increase by 10-100x after restoring the primary key index.
Given this schema: an e-commerce app with users, orders, order_items, and products tables. The top queries are: (1) find orders by user, sorted by date; (2) find order items for an order; (3) search products by name; (4) aggregate total revenue per day. Design the index strategy. Consider: composite indexes, partial indexes, covering indexes, and when a full-text search index (GIN) beats a B-tree for the product search.
Query 1: CREATE INDEX idx_orders_user_date ON orders (user_id, created_at DESC); β composite index matches the filter + sort. Add INCLUDE (total, status) if your SELECT only needs those columns (covering index β Index Only Scan).
Query 2: CREATE INDEX idx_order_items_order ON order_items (order_id); β standard FK index. Consider INCLUDE for frequently selected columns.
Query 3: For LIKE '%keyword%', B-tree can't help (leading wildcard). Use: CREATE INDEX idx_products_name_gin ON products USING gin (name gin_trgm_ops); β requires pg_trgm extension. Handles substring matches efficiently.
Query 4: Don't index for this β create a materialized view: CREATE MATERIALIZED VIEW daily_revenue AS SELECT date_trunc('day', created_at) AS day, SUM(total) AS revenue FROM orders GROUP BY 1; Refresh it on a schedule. The dashboard query hits the materialized view, not the 50M-row orders table.
Cheat Sheet β Bookmark This Section
Six cards with everything you need when you're debugging a slow query, tuning a database, or prepping for an interview. All real commands β copy-paste ready.
-- Basic plan (estimated) EXPLAIN SELECT ...; -- Actual timing + buffers EXPLAIN (ANALYZE, BUFFERS) SELECT ...; -- JSON for tools EXPLAIN (ANALYZE, FORMAT JSON) SELECT ...; -- Visual: explain.dalibo.com -- Key: look for Seq Scan on large tables -- Key: actual rows β« estimated = stale stats
Index Scan β best for selective queries Index Only Scan β even better (no table hit) Bitmap Index Scan β good for medium selectivity Seq Scan (small) β fine on <10K rows Seq Scan (large) β RED FLAG β add an index Nested Loop β good for small inner table Hash Join β good for large equi-joins Merge Join β good when both sides sorted
WHERE col = ? β B-tree on col WHERE a = ? AND b = ? β composite (a, b) ORDER BY col DESC β include col in index LIKE 'abc%' β B-tree works LIKE '%abc%' β GIN + pg_trgm Partial: WHERE active = true Covering: INCLUDE (col1, col2) Leftmost prefix rule: idx(a, b, c) supports: WHERE a = ? β WHERE a = ? AND b = ? β WHERE b = ? β (can't skip a)
-- Top 5 by total time SELECT query, calls, total_exec_time::int AS total_ms, mean_exec_time::int AS avg_ms FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 5; -- Top 5 by call count ...ORDER BY calls DESC LIMIT 5; -- Reset stats SELECT pg_stat_statements_reset();
# pgbouncer.ini [databases] mydb = host=127.0.0.1 port=5432 dbname=mydb [pgbouncer] pool_mode = transaction max_client_conn = 1000 default_pool_size = 20 reserve_pool_size = 5 reserve_pool_timeout = 3 # transaction mode = best for web apps # session mode = needed for prepared stmts # statement mode = most aggressive pooling
# postgresql.conf log_min_duration_statement = 200 # Logs any query slower than 200ms # Find currently running slow queries: SELECT pid, now() - query_start AS duration, query, state FROM pg_stat_activity WHERE state != 'idle' AND now() - query_start > '1 second'::interval ORDER BY duration DESC; # Kill a stuck query: SELECT pg_cancel_backend(pid);
Connected Topics β Where SQL Tuning Leads Next
SQL tuning doesn't live in isolation. It connects to almost every part of backend engineering β from how you design schemas to how you scale entire systems. Here's how the skills you've learned here map to the bigger picture.