Databases

SQL Tuning & Query Optimization

The single most impactful skill in backend engineering β€” turning a 90-second query into a 0.1ms query by changing one line. Every section has SQL you can run yourself.

8 Think Firsts 15 SVG Diagrams Real EXPLAIN Output 5 Hands-On Exercises 19 Tooltips
Section 1

TL;DR β€” The One-Minute Version

Mental Model: A SQL query is a request. The database's query plannerThe part of the database engine that decides HOW to execute your query β€” which indexes to use, which tables to scan first, which join algorithm to pick. Think of it as a GPS for your data. is a GPS β€” it picks the fastest route to your data. Your job as a developer isn't to write the SQL differently. It's to give the GPS better roads (indexes), cleaner maps (statistics), and shorter distances (schema design).

Most backend performance problems aren't caused by slow servers, bad code, or too much traffic. They're caused by one bad SQL query. A single missing index can make a query read every single row in a 10-million-row table instead of jumping straight to the 47 rows it needs. That's the difference between 89 milliseconds and 0.1 milliseconds. That's 699 times faster β€” from changing one line.

YOUR QUERY SELECT * FROM orders WHERE user_id = 42 sends to QUERY PLANNER The GPS "What's the fastest route to that data?" NO INDEX β†’ Seq Scan (dirt road) Reads ALL 1,000,000 rows β†’ 89ms WITH INDEX β†’ Index Scan (highway) Jumps to 47 matching rows β†’ 0.128ms 699x faster. Same query. One CREATE INDEX. From reading 6,334 disk pages to reading 50 disk pages. SQL tuning = giving the query planner better roads to drive on. You don't rewrite the query. You improve the infrastructure around it.

This page is the most hands-on page in the entire System Guide. Every section has real SQL you can copy, paste, and run. Real EXPLAIN ANALYZE output. Real slow query logs. Real before/after benchmarks. By the end, you'll know how to take any slow query, diagnose exactly where the time is going, and fix it β€” often in under 5 minutes.

One-line takeaway: You don't need to rewrite your SQL. You need to understand what the query planner is doing, give it the right indexes, and stop sending queries you don't need to send (N+1 problem). That covers 90% of all database performance issues.
What top companies actually expect: At Stripe, engineers are expected to include EXPLAIN ANALYZE output in pull requests that add new queries. At GitHub, a single slow query once took down github.com for 2 hours in 2012 β€” a sequential scanWhen the database reads every single row in a table from start to finish, checking each one against your WHERE clause. Like reading every page of a phone book to find one name. on the repositories table during a schema migration. At Shopify, their internal "query cop" tool automatically rejects any migration that adds a query without a matching index. SQL tuning isn't an optional skill β€” it's table stakes.
Section 2

The Scenario β€” 50ms to 3 Seconds Overnight

You're on call for an e-commerce platform. It's Tuesday morning. Your P95 latencyThe 95th percentile response time β€” 95% of requests are faster than this number. If your P95 is 200ms, then 95 out of 100 requests complete in under 200ms. The remaining 5% are slower. has been a comfortable 50ms for months. You grab your coffee, open Grafana, and see this:

3000ms 1500ms 500ms 200ms 50ms P95 Latency Mon Tue Wed Thu Fri Sat deploy P95: 50ms β†’ 3,100ms 62x slower. Users timing out.

The deploy log shows a new feature went out Monday night: an "Order History" page that lets users see all their past orders. The feature works perfectly β€” it shows the right data, the UI looks great, the tests all pass. But behind that innocent-looking page is this query:

order-history.sql β€” the guilty query
SELECT o.id, o.total, o.created_at, o.status,
       oi.product_name, oi.quantity, oi.price
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = $1
ORDER BY o.created_at DESC
LIMIT 20;

Looks completely reasonable, right? A simple join, a WHERE clause, a LIMIT. Every junior developer would write exactly this. And it runs fine in development with 500 rows. But production has 10 million orders and 45 million order items. Without an index on orders.user_id, the database has to read every single row in the orders table to find the ones belonging to user 42. That's a sequential scanAlso called a "full table scan." The database reads the table from the first page to the last page, checking every row against your WHERE condition. On a 10-million-row table, this means reading ~6,300 disk pages. Even with caching, it's orders of magnitude slower than an index scan. on 10 million rows. Every. Single. Request.

The math is brutal: Your orders table is ~1.2 GB on disk, spread across ~6,300 pagesPostgreSQL stores data in 8 KB chunks called "pages" (also called "blocks"). A table with 1.2 GB of data needs about 150,000 pages. When the database does a sequential scan, it reads these pages one by one.. Each request triggers a sequential scan that touches all 6,300 pages. At 200 requests per second, that's 1,260,000 page reads per second β€” way more than your shared buffer poolA region of RAM where PostgreSQL caches frequently-accessed disk pages. Default is 128 MB (about 16,384 pages). Once this fills up, the database starts evicting older pages and reading from disk β€” which is 100x slower. can handle. The buffer pool thrashes, disk I/O spikes, and every query slows down β€” not just this one. One bad query is poisoning the entire database.
Think First

You have a table with 10 million rows. Your query needs 47 of them. Without an index, the database reads all 10 million. What percentage of the work is wasted? And if you added an index, roughly how many disk pages would the database need to read instead of 6,334?

For the first part: 47 / 10,000,000 = ? For the second: think about how a B-tree index works. It's a tree with ~3 levels. You traverse 3-4 tree nodes, then read the matching data pages. 47 rows across ~8 KB pages...

The answer: 99.9995% of the work is wasted. The database reads 10 million rows to return 47. And with a B-tree index, it would read about 3 index pages (to traverse the tree) plus ~47 data pages β€” roughly 50 page reads instead of 6,334. That's what we'll fix in this page.

What Made This Hard to Catch

This query passed every code review, every test, every staging environment check. Why? Because the staging database had 500 orders. At 500 rows, a sequential scan takes 0.05ms β€” perfectly fast. The problem only appears when the table has millions of rows. This is the fundamental trap of SQL performance: queries that are correct and fast on small data can be catastrophically slow on production data.

STAGING: 500 rows Seq Scan on orders β†’ 0.05ms "Looks great! Ship it." Tests pass. Code review approved. deploy PRODUCTION: 10M rows Seq Scan on orders β†’ 89ms per query "P95 at 3 seconds. Page down." Same exact query. Same exact code. The query didn't change. The data grew. SQL performance is not about the query being "right." It's about the query being right AT SCALE.
How to reproduce this yourself right now: Spin up a PostgreSQL container (docker run -e POSTGRES_PASSWORD=test -p 5432:5432 postgres:16), create a table with generate_series(1, 10000000), and run the same query with and without an index. You'll see the exact same spike. We'll do exactly this in Section 5.
Section 3

The First Attempt β€” Throw More Servers at It

Your API is slow. Your instinct β€” and the instinct of every team under pressure β€” is to scale horizontally. Add more app servers. Bump the instance count from 4 to 8. Double the capacity, halve the problem, right?

So you spin up four more API servers behind your load balancer. Each server now handles half the requests it used to. Your CPU utilizationThe percentage of processing power being used. If your app servers show 30% CPU, the bottleneck isn't the app server β€” it's somewhere else (usually the database). on the app servers drops from 80% to 40%. You check the dashboard. Response times are... still 3 seconds.

Adding More App Servers β€” What Actually Happens 200 req/sec Load Bal. App 1 App 2 App 3 (new) App 4 (new) PostgreSQL CPU: 97% I/O wait: 340ms BOTTLENECK Still 200 req/sec hitting the SAME DB CPU: 40% was 80% What you fixed App server CPU dropped 80% β†’ 40% Each server handles fewer requests What you didn't fix Database still runs the same slow query Response time: still 3 seconds

Why didn't it work? Because the app servers weren't the bottleneck. They were sitting at 30% CPU, happily waiting for the database to respond. The database was the bottleneck. Every one of those 200 requests per second triggers the same sequential scan on the same 10-million-row table. Adding more app servers just means more connections all hammering the same slow database.

This is one of the most common mistakes in backend engineering: scaling the wrong layer. The app servers are fine. The network is fine. The load balancer is fine. The problem is a single SQL query doing a full table scan, and no amount of horizontal scaling on the app tier will fix it.

Think First

If adding more app servers doesn't help (because the DB is the bottleneck), what about adding read replicas? You set up 3 read replicas and route all SELECT queries to them. Does this fix the problem? Think about what each replica has to do when it receives the same query.

Each replica has the same data, same table, same missing index. What does each replica do when it gets SELECT * FROM orders WHERE user_id = 42?

Read replicas don't fix it either β€” not really. Each replica has the exact same table without an index. Each one still does a sequential scan on 10 million rows. You've spread the load across 3 databases instead of 1, so each handles ~67 requests per second instead of 200. That buys you some breathing room, but each request still takes 89ms of pure scan time. You've spent $3,000/month on three new database instances to treat a symptom instead of fixing the cause.

The Real Numbers

Let's look at exactly what you're paying for the "throw hardware at it" approach versus fixing the actual query:

ApproachMonthly CostP95 LatencyTime to Implement
Do nothing (1 DB) $350/month 3,100ms 0 minutes
Add 3 read replicas $1,400/month 890ms 2 hours setup
Scale app tier 4β†’8 $600/month (extra) 3,100ms (unchanged) 20 minutes
CREATE INDEX $0 extra 0.128ms 30 seconds

One line of SQL β€” CREATE INDEX idx_orders_user_id ON orders (user_id); β€” does more than $2,000/month of additional hardware. This is the entire argument for SQL tuning in one table: fix the query, not the infrastructure.

The lesson: Before you scale anything β€” before you add servers, replicas, caches, or CDNs β€” run EXPLAIN ANALYZE on your slowest queries. Nine times out of ten, the fix is a missing index, not a missing server. The next section shows you exactly what was happening under the hood and why it broke.
Section 4

Where It Breaks β€” The Four Horsemen of Slow Queries

The "add servers" approach failed because the problem was never the servers. It was the database. Specifically, it was one (or more) of four problems that cause 95% of all database performance issues. Let's meet all four β€” because you'll encounter every single one of them in your career.

Think First

Your "order history" page loads 20 orders. For each order, the code runs a separate query to fetch order items: SELECT * FROM order_items WHERE order_id = ?. That's 1 query for orders + 20 queries for items = 21 queries per page load. If 200 users hit this page per second, how many queries per second hit the database? And what would happen if you replaced those 21 queries with a single JOIN?

200 users Γ— 21 queries = 4,200 queries/sec. With one JOIN: 200 queries/sec. That's a 21x reduction. This is the N+1 problem.
The Four Horsemen of Slow Queries SEQ SCAN Reading every row in the table because there's no index. 89ms β†’ 0.1ms with index N+1 QUERIES 1 query to get users, then 1 query PER user to get their orders. 1001 queries β†’ 2 queries WRONG INDEX Index exists but query can't use it. Wrong column order, func call. Invisible until EXPLAIN CONN EXHAUSTION Too many open connections. DB refuses new ones. App hangs. PgBouncer: 1000 β†’ 25 Combined damage at scale A missing index causes slow queries β†’ slow queries hold connections longer β†’ connection pool fills up β†’ new requests queue β†’ timeout cascade β†’ entire app goes down. These four problems feed each other. Fix any one and the others often resolve too.

Horseman #1: Sequential Scan on a Big Table

We've already met this one. When you write WHERE user_id = 42 and there's no index on user_id, the database has no choice but to read the entire table. It starts at page 1 and reads every page until the end. On a 10-million-row table, that's thousands of disk pages.

Here's what that looks like in real EXPLAIN ANALYZE output β€” the exact output PostgreSQL gives you:

psql β€” EXPLAIN ANALYZE without index
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 42;

-- Output:
-- Seq Scan on orders  (cost=0.00..18334.00 rows=50 width=96)
--                     (actual time=0.031..89.442 rows=47 loops=1)
--   Filter: (user_id = 42)
--   Rows Removed by Filter: 999953
--   Buffers: shared hit=6334
-- Planning Time: 0.089 ms
-- Execution Time: 89.521 ms

Let's read that output like a doctor reads a blood test. Every line tells you something:

LineWhat it meansIs it bad?
Seq Scan on ordersFull table scan β€” reading every row, start to finishYes. This is the problem.
rows=50 (estimated) vs rows=47 (actual)Planner guessed ~50 rows, found 47. Estimate is good.No, estimates are close.
Rows Removed by Filter: 999953Read 1,000,000 rows, kept 47, threw away 999,953Yes. 99.995% waste.
Buffers: shared hit=6334Read 6,334 pages from the buffer poolPostgreSQL's in-memory cache of disk pages. "shared hit" means the page was already in RAM. "shared read" means it had to go to disk β€” much slower. (RAM cache)Yes. Way too many pages.
Execution Time: 89.521 msWall clock time for the queryYes. 89ms per request = trouble.

The smoking gun is Rows Removed by Filter: 999953. The database read a million rows and threw away 999,953 of them. That's like reading an entire phone book to find one person's number.

Horseman #2: The N+1 Query Problem

This one is sneaky because it doesn't show up as one slow query. It shows up as a thousand fast queries where you only needed two.

Here's the classic example. You want to show a list of users and how many orders each one has. The "obvious" code does this:

app/controllers/users_controller.rb β€” N+1 problem
# BAD: N+1 β€” fires 1001 queries for 1000 users
users = User.all                    # Query 1: SELECT * FROM users
users.each do |user|
  puts user.orders.count            # Query 2..1001: SELECT COUNT(*)
end                                 #   FROM orders WHERE user_id = ?
# Total: 1 + 1000 = 1001 queries.
# At 1ms per query = 1001ms = over 1 SECOND just for DB calls.

# FIXED: eager loading β€” 2 queries total
users = User.includes(:orders).all  # Query 1: SELECT * FROM users
                                    # Query 2: SELECT * FROM orders
                                    #          WHERE user_id IN (1,2,3...)
users.each do |user|
  puts user.orders.size             # No query β€” already loaded
end
# Total: 2 queries. ~3ms.
views.py β€” N+1 in Django
# BAD: N+1
users = User.objects.all()           # 1 query
for user in users:
    print(user.order_set.count())    # N queries

# FIXED: prefetch_related
users = User.objects.prefetch_related('order_set').all()  # 2 queries
for user in users:
    print(len(user.order_set.all())) # No query β€” already loaded
The raw SQL happening behind the scenes
-- BAD: N+1 (what the ORM generates)
SELECT * FROM users;                           -- query 1
SELECT COUNT(*) FROM orders WHERE user_id = 1; -- query 2
SELECT COUNT(*) FROM orders WHERE user_id = 2; -- query 3
SELECT COUNT(*) FROM orders WHERE user_id = 3; -- query 4
-- ... 997 more queries ...

-- FIXED: One query with JOIN
SELECT u.id, u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;

The N+1 problem is the most common performance bug in web applications using ORMsObject-Relational Mappers β€” libraries like ActiveRecord (Rails), Django ORM, SQLAlchemy (Python), or Entity Framework (.NET) that let you write code instead of SQL. They're convenient but can generate inefficient queries if you're not careful.. It's called "N+1" because you fire 1 query to get N records, then 1 more query for EACH record. For 1,000 users, that's 1,001 database round trips instead of 2.

Horseman #3: Missing or Wrong Indexes

Sometimes you have an index, but the query can't use it. This is more frustrating than having no index at all, because you think you've solved the problem.

The three most common ways your index becomes useless:

Three ways to accidentally disable your index
-- You created this index:
CREATE INDEX idx_orders_created ON orders (created_at);

-- ❌ BROKEN: Function call on indexed column
-- The index stores raw dates, not YEAR(dates). Can't use the index.
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- βœ… FIXED: Rewrite to use the column directly
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

-- ❌ BROKEN: Implicit type cast
-- user_id is an integer column, but you pass a string.
-- Some databases cast the COLUMN to match, disabling the index.
SELECT * FROM orders WHERE user_id = '42';

-- βœ… FIXED: Use the correct type
SELECT * FROM orders WHERE user_id = 42;

-- ❌ BROKEN: Wrong column order in composite index
-- Index is (status, created_at) but you query by created_at only.
-- A composite index is like a phone book sorted by (last_name, first_name).
-- You can't look up by first_name alone β€” the book isn't sorted that way.
CREATE INDEX idx_status_date ON orders (status, created_at);
SELECT * FROM orders WHERE created_at > '2024-01-01'; -- Can't use index!

-- βœ… FIXED: Create index with the column you actually filter on FIRST
CREATE INDEX idx_date ON orders (created_at);
-- Or: query includes the leading column
SELECT * FROM orders WHERE status = 'shipped' AND created_at > '2024-01-01';

Horseman #4: Connection Exhaustion

Every database has a maximum number of simultaneous connections. PostgreSQL defaults to max_connections = 100. That sounds like a lot until you realize that slow queries hold connections open longer. If each query takes 89ms instead of 0.1ms, connections are occupied 890x longer. Your pool drains instantly.

Here's the real config you'd see in a PostgreSQL setup with PgBouncerA lightweight connection pooler for PostgreSQL. It sits between your app and the database, maintaining a small pool of real DB connections and multiplexing thousands of app connections through them. Essential for any production PostgreSQL deployment.:

pgbouncer.ini β€” real production config
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp_production

[pgbouncer]
listen_port = 6432
max_client_conn = 1000     # App servers can open up to 1000 connections
default_pool_size = 25     # But only 25 real DB connections are open
reserve_pool_size = 5      # 5 extra for emergencies
pool_mode = transaction    # Return connection to pool after each transaction

# The math: 1000 app connections funnel into 25 DB connections.
# Each DB connection handles ~40 short queries per second.
# 25 Γ— 40 = 1000 queries/sec capacity.
# Without PgBouncer: 1000 app connections = 1000 DB connections = PostgreSQL OOM.

Without a connection pooler, if you have 8 app servers each opening 125 connections, that's 1,000 connections to PostgreSQL. Each connection consumes about 10 MB of memory on the database server. That's 10 GB just for connection overhead β€” before your database stores a single byte of data. PgBouncer funnels those 1,000 app connections through just 25 real database connections, cutting memory usage by 97%.

The cascade effect: These four problems amplify each other. A missing index (Horseman #1) makes queries slow. Slow queries hold connections longer (Horseman #4). The connection pool fills up. New requests wait in a queue. Timeouts cascade. The entire application goes down β€” not because of one bug, but because of a chain reaction started by a single missing index.
Section 5

The Breakthrough β€” EXPLAIN ANALYZE Changes Everything

All four horsemen have one thing in common: you can't fix what you can't see. You can stare at your SQL all day and it looks fine. The query is correct. It returns the right data. The problem is invisible in the code β€” it's in the execution plan the database chose.

That's what EXPLAIN ANALYZE gives you. It's like an X-ray for your query. Instead of just running the query and giving you results, the database also tells you exactly how it found those results β€” which tables it scanned, which indexes it used (or didn't), how many rows it read, and how long each step took.

Think First

You have the slow query from Section 2. Before reading further, what would you add to the database to make it fast? And how would you verify that your fix actually worked β€” that the database is using the fix you created?

You need two things: something to CREATE, and something to ask the database to EXPLAIN.

The fix is two commands. One to create the index, and one to verify the database uses it:

psql β€” the fix (30 seconds)
-- Step 1: Create the index (takes ~5 seconds on 10M rows)
CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders (user_id);
-- CONCURRENTLY = don't lock the table while building the index.
-- Without it, your table is locked for writes during index creation.

-- Step 2: Verify the fix
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 42;

-- Output (AFTER index):
-- Index Scan using idx_orders_user_id on orders
--   (cost=0.42..52.18 rows=50 width=96)
--   (actual time=0.028..0.089 rows=47 loops=1)
--   Index Cond: (user_id = 42)
--   Buffers: shared hit=50
-- Planning Time: 0.113 ms
-- Execution Time: 0.128 ms
Before vs After: One CREATE INDEX BEFORE (No Index) Scan Type: Seq Scan (full table) Pages Read: 6,334 pages Rows Scanned: 1,000,000 rows Rows Returned: 47 rows Rows Wasted: 999,953 (99.995%) Execution: 89.521 ms AFTER (With Index) Scan Type: Index Scan (direct lookup) Pages Read: 50 pages Rows Scanned: 47 rows (only matches) Rows Returned: 47 rows Rows Wasted: 0 (0%) Execution: 0.128 ms 699Γ— faster. 6,334 pages β†’ 50 pages. Zero code change.

Read those two outputs side by side. Same query. Same data. Same server. The only difference is one CREATE INDEX statement. The planner went from "I guess I'll read everything" to "I know exactly where that data is." From 89.5ms to 0.128ms. From 6,334 page reads to 50.

Try It Yourself β€” Right Now

This is the most important hands-on exercise on this page. Copy these commands, run them, and see the difference with your own eyes:

bash + psql β€” reproduce the 699x improvement
# 1. Start PostgreSQL (Docker)
docker run --name pg-tuning -e POSTGRES_PASSWORD=test \
  -p 5432:5432 -d postgres:16

# 2. Connect
docker exec -it pg-tuning psql -U postgres

# 3. Create table with 1M rows (takes ~3 seconds)
CREATE TABLE orders (
  id         SERIAL PRIMARY KEY,
  user_id    INT NOT NULL,
  total      NUMERIC(10,2),
  status     TEXT,
  created_at TIMESTAMPTZ DEFAULT now()
);

INSERT INTO orders (user_id, total, status)
SELECT
  (random() * 20000)::int,          -- 20K distinct users
  (random() * 500 + 5)::numeric(10,2),
  (ARRAY['pending','shipped','delivered','cancelled'])[floor(random()*4+1)::int]
FROM generate_series(1, 1000000);

-- 4. Run BEFORE (no index) β€” note the Seq Scan and time
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 42;

-- 5. Create the index
CREATE INDEX idx_orders_user_id ON orders (user_id);

-- 6. Run AFTER (with index) β€” note the Index Scan and time
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 42;

-- 7. Compare the two outputs. You'll see:
--    Seq Scan β†’ Index Scan
--    ~80ms   β†’ ~0.1ms
--    ~6000 buffers β†’ ~50 buffers

Finding Your Slow Queries in Production

You can't run EXPLAIN ANALYZE on queries you don't know about. In production, you need to find the slow ones first. Here are the real tools:

postgresql.conf + pg_stat_statements
-- Method 1: Slow query log
-- In postgresql.conf:
-- log_min_duration_statement = 100
-- This logs every query that takes more than 100ms.
-- Check your PostgreSQL log file for the output.

-- Method 2: pg_stat_statements (MUCH better)
-- Enable it first:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Then find your 10 slowest queries RIGHT NOW:
SELECT
  query,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_ms,
  round(total_exec_time::numeric, 2) AS total_ms,
  rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- This shows you EXACTLY which queries are slow,
-- how often they run, and how much total time they consume.
-- The "total_ms" column is the most important β€”
-- a query averaging 5ms that runs 1M times
-- is worse than a query averaging 500ms that runs once.
MySQL slow query log
-- Enable slow query log (run as admin):
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.1;  -- log queries > 100ms
SET GLOBAL log_queries_not_using_indexes = 'ON';

-- Check it's enabled:
SHOW VARIABLES LIKE 'slow_query%';

-- Find slow queries:
-- The log file is at: /var/lib/mysql/hostname-slow.log
-- Or use mysqldumpslow to summarize:
-- mysqldumpslow -s t /var/lib/mysql/hostname-slow.log

-- MySQL equivalent of pg_stat_statements:
-- Enable Performance Schema (usually on by default):
SELECT
  DIGEST_TEXT AS query,
  COUNT_STAR AS calls,
  ROUND(AVG_TIMER_WAIT / 1000000000, 2) AS avg_ms,
  SUM_ROWS_EXAMINED AS rows_scanned,
  SUM_ROWS_SENT AS rows_returned
FROM performance_schema.events_statements_summary_by_digest
ORDER BY AVG_TIMER_WAIT DESC
LIMIT 10;
The workflow: Step 1 β€” Find slow queries (pg_stat_statements). Step 2 β€” Understand WHY they're slow (EXPLAIN ANALYZE). Step 3 β€” Fix the root cause (usually: add an index). Step 4 β€” Verify the fix (EXPLAIN ANALYZE again). This four-step loop is the entire discipline of SQL tuning. Everything else on this page is details.
Section 6

How It Works β€” The Complete Toolkit

You now know the four problems and the breakthrough tool (EXPLAIN ANALYZE). This section is the reference manual β€” six essential topics, each with real commands you can run. Think of these as the six tools in your SQL tuning toolbox.

Tool 1: Reading EXPLAIN ANALYZE Like a Pro

EXPLAIN ANALYZE is the single most important command in database performance. It runs your query AND tells you exactly how the database executed it. The key is knowing which parts to look at.

Here's a more complex query with a real plan β€” a join across two tables:

psql β€” EXPLAIN ANALYZE on a JOIN query
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, oi.product_name, oi.quantity
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = 42
ORDER BY o.created_at DESC
LIMIT 20;

-- Limit  (cost=298.15..298.20 rows=20 width=52)
--        (actual time=0.192..0.198 rows=20 loops=1)
--   -> Sort  (cost=298.15..298.28 rows=50 width=52)
--            (actual time=0.190..0.193 rows=20 loops=1)
--         Sort Key: o.created_at DESC
--         Sort Method: top-N heapsort  Memory: 27kB
--         -> Nested Loop  (cost=0.85..297.05 rows=50 width=52)
--                         (actual time=0.038..0.158 rows=47 loops=1)
--               -> Index Scan using idx_orders_user_id on orders o
--                  (cost=0.42..52.18 rows=50 width=24)
--                  (actual time=0.025..0.055 rows=47 loops=1)
--                     Index Cond: (user_id = 42)
--                     Buffers: shared hit=50
--               -> Index Scan using idx_order_items_order_id on order_items oi
--                  (cost=0.43..4.87 rows=1 width=36)
--                  (actual time=0.001..0.002 rows=1 loops=47)
--                     Index Cond: (order_id = o.id)
--                     Buffers: shared hit=94
-- Planning Time: 0.324 ms
-- Execution Time: 0.241 ms

Read this plan from the inside out, bottom to top. The innermost nodes execute first:

Read EXPLAIN Plans Inside-Out 1 Index Scan on orders Find 47 rows where user_id=42. Reads 50 pages. Time: 0.055ms 2 Index Scan on order_items (Γ—47) For each of the 47 orders, find items. 94 pages. Time: 0.002ms Γ— 47 loops 3 Nested Loop Join Combine orders + order_items. 47 result rows. Total: 0.158ms 4 Sort by created_at DESC In-memory heapsort (27 KB). Only needs top 20. LIMIT 20 β†’ return to client (0.241ms total) Total: 0.241ms. 144 buffer hits. Zero disk reads. Both indexes working perfectly.

The key numbers to check in every EXPLAIN ANALYZE output:

  • Scan type β€” Seq Scan on a large table = problem. Index Scan = good.
  • Rows Removed by Filter β€” If this is large, you're reading rows you don't need.
  • Buffers: shared hit vs shared read β€” "hit" means RAM cache. "read" means disk. Disk reads are 100x slower.
  • loops β€” If a node says loops=47, multiply its time by 47 to get the real cost.
  • actual time vs estimated cost β€” If these differ wildly, your statisticsPostgreSQL maintains statistics about your data β€” how many distinct values each column has, the distribution of values, etc. These stats feed the query planner. If they're outdated (e.g., after a big bulk insert), the planner makes bad decisions. Fix with ANALYZE tablename. are stale. Run ANALYZE tablename;

Tool 2: Scan Types β€” How the Database Reads Data

When the query planner decides how to find your data, it picks one of several scan strategies. Each has different performance characteristics. Think of these like different ways to find a word in a book:

Scan TypeAnalogyWhen UsedSpeed
Seq Scan Reading every page of a book No usable index, or table is tiny Slowest on large tables
Index Scan Using the book's index, then flipping to the page Index exists, few matching rows Fast for selective queries
Bitmap Index ScanA two-step process: first, scan the index to build a bitmap of which pages contain matching rows. Then, read those pages in sequential order. Faster than Index Scan when you're fetching many rows (say 5-20% of the table) because it avoids random I/O. Marking all matching pages first, then reading them in order Index exists, moderate number of matching rows (5-20%) Middle ground
Index Only ScanThe best possible scan. The index itself contains ALL the columns the query needs, so the database never touches the table at all. Only possible with "covering indexes" that include all selected columns. Finding your answer in the index itself without opening the book All requested columns are in the index Fastest β€” skips table entirely

The planner picks the scan type based on math. If your query matches 0.01% of the table, Index Scan wins. If it matches 60% of the table, Seq Scan is actually faster (because sequential disk reads are much faster than random reads). The planner knows the tipping point β€” usually around 5-15% of the table.

psql β€” see the planner choose different scans
-- Selective query (few rows) β†’ Index Scan
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
-- Index Scan using idx_orders_user_id on orders
-- (the planner knows user_id=42 matches ~50 rows out of 1M)

-- Non-selective query (many rows) β†’ Seq Scan
EXPLAIN SELECT * FROM orders WHERE status = 'shipped';
-- Seq Scan on orders  Filter: (status = 'shipped')
-- (the planner knows 'shipped' matches ~250K rows = 25% of table.
--  Seq Scan is FASTER here because sequential I/O beats random I/O)

-- Index Only Scan β€” create a covering index:
CREATE INDEX idx_orders_covering ON orders (user_id) INCLUDE (total, created_at);
EXPLAIN SELECT total, created_at FROM orders WHERE user_id = 42;
-- Index Only Scan using idx_orders_covering on orders
-- (everything the query needs is IN the index β€” no table access!)

Tool 3: Index Strategies β€” Single, Composite, Partial, Covering

Knowing that indexes make queries fast is step one. Knowing which type of index to create is step two. There are four types you'll use in practice, and choosing the right one makes the difference between a good index and a great one.

Single-column index β€” the most common. One index on one column. Perfect when your WHERE clause filters on one thing.

Single column index
-- Simple: index on the column you filter by
CREATE INDEX idx_orders_user_id ON orders (user_id);

-- Now this query uses the index:
SELECT * FROM orders WHERE user_id = 42;

-- But this query CANNOT use it:
SELECT * FROM orders WHERE status = 'shipped';
-- (different column β€” needs its own index)

Composite index β€” index on multiple columns. Column order matters. Think of it like a phone book sorted by (last_name, first_name). You can search by last_name alone, or by (last_name AND first_name), but NOT by first_name alone.

Composite index
-- Index on (user_id, status) β€” order matters!
CREATE INDEX idx_orders_user_status ON orders (user_id, status);

-- βœ… Can use index (leading column):
SELECT * FROM orders WHERE user_id = 42;

-- βœ… Can use index (both columns):
SELECT * FROM orders WHERE user_id = 42 AND status = 'shipped';

-- ❌ CANNOT use index (skips leading column):
SELECT * FROM orders WHERE status = 'shipped';

-- Rule: the "leftmost prefix" rule.
-- Index on (A, B, C) supports:
--   WHERE A = ?          βœ…
--   WHERE A = ? AND B = ? βœ…
--   WHERE A = ? AND B = ? AND C = ? βœ…
--   WHERE B = ?          ❌ (skips A)
--   WHERE C = ?          ❌ (skips A and B)

Partial index β€” indexes only the rows that match a condition. Smaller, faster, cheaper. Perfect when you only ever query a subset of the data.

Partial index
-- You only ever query "active" orders. Why index the other 90%?
CREATE INDEX idx_active_orders ON orders (user_id)
WHERE status != 'cancelled' AND status != 'archived';

-- Index size: covers ~10% of the table = 90% smaller!
-- Works for:
SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';

-- Doesn't work for:
SELECT * FROM orders WHERE user_id = 42 AND status = 'cancelled';
-- (cancelled rows aren't in the index)

-- Real-world example: soft-deleted rows
CREATE INDEX idx_non_deleted_users ON users (email)
WHERE deleted_at IS NULL;
-- Most queries filter WHERE deleted_at IS NULL anyway.
-- This index is tiny and blazing fast.

Covering index (PostgreSQL: INCLUDE, MySQL: "covering") β€” the index contains all columns the query needs, so the database never reads the actual table. This enables Index Only Scan β€” the fastest possible scan type.

Covering index
-- Your dashboard shows: user's recent orders with totals
SELECT total, created_at FROM orders
WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10;

-- Regular index β†’ Index Scan (looks up index, then reads table):
CREATE INDEX idx_orders_user_id ON orders (user_id);

-- Covering index β†’ Index Only Scan (never touches table):
CREATE INDEX idx_orders_user_covering
ON orders (user_id, created_at DESC) INCLUDE (total);

-- The INCLUDE columns are stored IN the index but not used for sorting.
-- The database finds user_id=42 in the index, grabs total + created_at
-- from the index itself, and returns. Zero table access.

-- Verify with EXPLAIN:
-- "Index Only Scan using idx_orders_user_covering"
-- (If you see "Index Scan" instead, run VACUUM on the table first.
--  PostgreSQL needs visibility info to be sure the index-only path is safe.)

Tool 4: N+1 Detection and Fix

You already saw the N+1 problem in Section 4. But how do you find N+1 queries in a real application? You can't just read the code β€” ORMs hide the actual SQL. Here are the tools that catch them:

Gemfile + config
# Gemfile
gem 'bullet', group: :development

# config/environments/development.rb
config.after_initialize do
  Bullet.enable = true
  Bullet.alert  = true   # Browser popup when N+1 detected
  Bullet.console = true   # Console warning
  Bullet.rails_logger = true
end

# Now when you load a page with N+1 queries,
# Bullet pops up a warning:
#
# "N+1 Query detected:
#  User => [:orders]
#  Add to your query: .includes(:orders)"
pip install + settings.py
# pip install nplusone

# settings.py
INSTALLED_APPS += ['nplusone.ext.django']
MIDDLEWARE += ['nplusone.ext.django.NPlusOneMiddleware']
NPLUSONE_RAISE = True  # Raise exception on N+1 (dev only!)

# Output when N+1 detected:
# nplusone.core.exceptions.NPlusOneError:
#   Potential N+1 query detected on `User.order_set`
psql β€” spot N+1 from pg_stat_statements
-- N+1 queries show up as a query with VERY high "calls" count
-- relative to other queries.

SELECT query, calls, mean_exec_time, rows
FROM pg_stat_statements
WHERE query LIKE '%orders%'
ORDER BY calls DESC LIMIT 5;

-- If you see:
--   "SELECT COUNT(*) FROM orders WHERE user_id = $1"
--   calls: 847,293
--   mean_exec_time: 0.4ms
--
-- That's 847K calls doing the same parameterized query.
-- Classic N+1. Total time: 847K Γ— 0.4ms = 339 seconds of DB time.
-- Fix: one JOIN query, ~2ms total.

Tool 5: Connection Pooling with PgBouncer

Every database connection costs memory β€” about 10 MB per connection in PostgreSQL. If you have 8 app servers each opening 100 connections, that's 800 connections consuming 8 GB of RAM just for connection overhead. PgBouncerA lightweight connection pooler that sits between your app servers and PostgreSQL. Your app connects to PgBouncer (which accepts thousands of connections cheaply), and PgBouncer maintains a small pool of real connections to PostgreSQL. Think of it as a receptionist who takes calls from 1000 people but only has 25 phone lines to the back office. solves this by multiplexing many app connections through a small pool of real database connections.

8 App Servers 125 conn each 1,000 connections Memory cost: ~50 KB each (cheap) = 50 MB total 1000 conns PgBouncer Connection Pooler 1000 in β†’ 25 out RAM: ~2 MB total pool_mode = transaction 25 conns PostgreSQL max_connections = 100 Only 25 actually used Memory per connection: ~10 MB each 25 Γ— 10 MB = 250 MB (vs 10 GB without PgBouncer)

You can benchmark PgBouncer's impact with pgbench β€” PostgreSQL's built-in benchmarking tool:

bash β€” benchmark with pgbench
# Initialize pgbench tables
pgbench -i -s 50 mydb

# Benchmark WITHOUT PgBouncer (direct to PostgreSQL)
pgbench -c 100 -j 4 -T 60 -h localhost -p 5432 mydb
# Result: tps = 1,847 (including connections establishing)
# Many "connection refused" errors at high concurrency

# Benchmark WITH PgBouncer (connect through pooler)
pgbench -c 100 -j 4 -T 60 -h localhost -p 6432 mydb
# Result: tps = 4,213 (including connections establishing)
# Zero connection errors. 2.3x throughput.

Tool 6: Prepared Statements

Every time you send a SQL query, the database does two things: parse + plan the query, then execute it. For simple queries, planning takes 0.1-0.5ms. If you run the same query 10,000 times per second (with different parameter values), that's 1-5 seconds wasted on planning alone.

Prepared statementsA query that's parsed and planned once, then executed many times with different parameter values. The database caches the execution plan, saving the parse+plan overhead on every subsequent call. Most ORMs use them automatically. solve this by parsing and planning the query once, then re-executing the cached plan with different parameters:

psql β€” prepared statement example
-- Without prepared statement: parse + plan EVERY time
SELECT * FROM orders WHERE user_id = 42;   -- plan: 0.1ms + exec: 0.1ms
SELECT * FROM orders WHERE user_id = 99;   -- plan: 0.1ms + exec: 0.1ms
SELECT * FROM orders WHERE user_id = 7;    -- plan: 0.1ms + exec: 0.1ms
-- 10,000 calls = 10,000 Γ— 0.1ms planning = 1 second wasted

-- With prepared statement: plan ONCE, execute many
PREPARE get_orders (int) AS
  SELECT * FROM orders WHERE user_id = $1;

EXECUTE get_orders(42);  -- plan: 0ms (cached) + exec: 0.1ms
EXECUTE get_orders(99);  -- plan: 0ms (cached) + exec: 0.1ms
EXECUTE get_orders(7);   -- plan: 0ms (cached) + exec: 0.1ms
-- 10,000 calls = 0ms planning + 10,000 Γ— 0.1ms = 1 second total
-- Saved: 1 second of pure planning overhead.

-- Most ORMs do this automatically. Check yours:
-- Rails: config.active_record.prepared_statements = true (default)
-- Django: uses server-side prepared statements with psycopg2
-- SQLAlchemy: use_native_hstore=True, use prepared by default
Gotcha with PgBouncer: If you use PgBouncer in transaction mode (recommended), prepared statements break β€” because each transaction might use a different backend connection, and the prepared plan is per-connection. Fix: use PgBouncer's prepared_statement_cache_enabled option (available in PgBouncer 1.21+), or switch to session mode for apps that rely heavily on prepared statements.
Section 7

Going Deeper β€” Under the Hood

Section 6 gave you the tools. This section explains the engine behind them β€” the internals that determine why the planner picks one plan over another, why your table gets slower over time even without schema changes, and how to handle tables that outgrow a single machine.

Think First

You have a table with 10 million rows and an index on status. 95% of rows have status = 'completed' and 5% have status = 'pending'. You run WHERE status = 'completed'. Will the database use the index or do a sequential scan? What about WHERE status = 'pending'?

Reading 9.5 million index entries + 9.5 million random data page fetches is slower than just scanning the whole table sequentially. The planner is smart enough to know this.

The query planner doesn't guess. It calculates a cost for every possible execution strategy and picks the cheapest one. The "cost" is a unitless number β€” think of it as "estimated effort." Lower cost = faster plan.

The planner's cost formula depends on a few key settings you can actually see and tune:

psql β€” see the planner's cost parameters
-- These are the knobs the planner uses to estimate costs:
SHOW seq_page_cost;          -- 1.0 (cost of reading one sequential page)
SHOW random_page_cost;       -- 4.0 (cost of reading one random page)
SHOW cpu_tuple_cost;         -- 0.01 (cost of processing one row)
SHOW cpu_index_tuple_cost;   -- 0.005 (cost of processing one index entry)

-- The ratio matters: random_page_cost / seq_page_cost = 4.0
-- This tells the planner: "random I/O is 4x more expensive than sequential I/O."
-- On SSDs, this gap is smaller. Many production setups change it:
SET random_page_cost = 1.1;  -- For SSD-backed databases
-- This makes the planner MORE willing to use Index Scan
-- (because random I/O on SSD is almost as fast as sequential).

The planner also uses statistics about your data to estimate how many rows each operation will return. These stats live in pg_stats and are updated by ANALYZE:

psql β€” inspect planner statistics
-- See what the planner knows about your orders table:
SELECT
  attname AS column,
  n_distinct,        -- How many distinct values (-1 = all unique)
  most_common_vals,  -- Most frequent values
  most_common_freqs, -- How frequent each is
  correlation        -- How physically ordered the data is (1.0 = perfectly sorted)
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'user_id';

-- If n_distinct is wrong, the planner makes bad row estimates,
-- which leads to bad plan choices (e.g., Seq Scan when Index Scan is better).

-- Force a statistics refresh:
ANALYZE orders;

-- For columns with unusual distributions, increase the sample size:
ALTER TABLE orders ALTER COLUMN user_id SET STATISTICS 1000;
-- Default is 100. Higher = more accurate stats, slower ANALYZE.
How the Planner Picks an Execution Plan YOUR QUERY SELECT ... WHERE ... STATISTICS pg_stats, n_distinct COST SETTINGS random_page_cost etc QUERY PLANNER Generates candidate plans Estimates cost of each Picks the cheapest CHOSEN PLAN Plan A: Index Scan β†’ cost=52 Plan B: Seq Scan β†’ cost=18334 Plan C: Bitmap β†’ cost=487 Winner: Index Scan (lowest cost)
When the planner gets it wrong: If you know the planner is picking a bad plan, you can hint at a better one. PostgreSQL doesn't have query hints like Oracle, but you can disable strategies to force a choice: SET enable_seqscan = off; temporarily disables Seq Scan, forcing the planner to use an index. Only use this for debugging β€” never in production code.

PostgreSQL uses a system called MVCCMulti-Version Concurrency Control. Instead of locking rows during updates, PostgreSQL creates a NEW version of the row and marks the old one as "dead." This means readers never block writers. The downside: dead rows pile up and need to be cleaned. for concurrency. When you UPDATE a row, PostgreSQL doesn't modify it in place β€” it creates a brand new copy and marks the old one as "dead." When you DELETE a row, it's just marked dead, not actually removed.

These dead rows β€” called bloat β€” pile up over time. A table with 1 million live rows might actually have 3 million rows on disk if you've been doing lots of updates and deletes. Your sequential scans read all 3 million. Your index scans have to check if rows are "visible" to the current transaction.

psql β€” check bloat and run VACUUM
-- Check how many dead rows your table has:
SELECT
  relname AS table,
  n_live_tup AS live_rows,
  n_dead_tup AS dead_rows,
  round(n_dead_tup::numeric / greatest(n_live_tup, 1) * 100, 1) AS dead_pct,
  last_vacuum,
  last_autovacuum
FROM pg_stat_user_tables
WHERE relname = 'orders';

-- If dead_pct > 20%, you have a bloat problem.

-- Manual VACUUM (reclaims dead rows for reuse):
VACUUM orders;

-- VACUUM ANALYZE (also updates planner statistics):
VACUUM ANALYZE orders;

-- Nuclear option β€” VACUUM FULL (rewrites entire table, locks it):
-- Only use this during maintenance windows!
VACUUM FULL orders;
-- This reclaims disk space. Regular VACUUM just marks dead rows as reusable
-- but doesn't shrink the file on disk.

-- Check autovacuum settings:
SHOW autovacuum_vacuum_threshold;       -- default: 50
SHOW autovacuum_vacuum_scale_factor;    -- default: 0.2
-- Autovacuum runs when: dead rows > threshold + scale_factor Γ— live_rows
-- For a 1M row table: 50 + 0.2 Γ— 1M = 200,050 dead rows before autovacuum.
-- For high-write tables, lower the scale factor:
ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.05);

When your table reaches hundreds of millions of rows, even indexed queries slow down because the B-treeThe most common index structure. A balanced tree where each level narrows down the search space. For 10M rows, a B-tree has ~3 levels, so any lookup needs at most 3-4 page reads. For 1 billion rows, it's still only ~4-5 levels. B-trees scale beautifully. gets deeper, VACUUM takes longer, and maintenance operations become painful. The solution: split the table into smaller chunks called partitions.

psql β€” range partitioning by date
-- Create a partitioned table (PostgreSQL 10+):
CREATE TABLE orders (
  id          BIGSERIAL,
  user_id     INT NOT NULL,
  total       NUMERIC(10,2),
  status      TEXT,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);

-- Create monthly partitions:
CREATE TABLE orders_2024_01 PARTITION OF orders
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE orders_2024_02 PARTITION OF orders
  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- ... one per month

-- Now when you query:
SELECT * FROM orders WHERE created_at >= '2024-03-01' AND created_at < '2024-04-01';
-- PostgreSQL ONLY scans the orders_2024_03 partition.
-- The other 11 months are never touched. This is called "partition pruning."

-- Verify pruning is working:
EXPLAIN SELECT * FROM orders WHERE created_at >= '2024-03-01' AND created_at < '2024-04-01';
-- Should show: "Append" with only orders_2024_03 listed.
-- If all partitions appear, pruning failed (check your WHERE clause).

Partitioning is most useful when your queries naturally filter on the partition key (usually date). If your queries filter by user_id, you'd partition by user_id ranges or use hash partitioning instead. The key insight: partition on the column you filter by most often.

A CTECommon Table Expression β€” a named temporary result set defined with the WITH keyword. Looks like: WITH cte_name AS (SELECT ...) SELECT * FROM cte_name. Often used for readability. (Common Table Expression) looks cleaner than a subquery. But in PostgreSQL 11 and earlier, CTEs are optimization barriers β€” the planner can't push filters from the outer query into the CTE. This was fixed in PostgreSQL 12+ with "CTE inlining," but it's still a gotcha if you're on an older version or using a MATERIALIZED CTE.

Materialized CTE β€” optimization barrier
-- This CTE materializes ALL orders first, THEN filters.
-- On PG11 or with MATERIALIZED keyword:
WITH all_orders AS MATERIALIZED (
  SELECT * FROM orders            -- reads ALL 10M rows
)
SELECT * FROM all_orders
WHERE user_id = 42;               -- filters AFTER reading everything

-- EXPLAIN shows:
-- CTE Scan on all_orders
--   Filter: (user_id = 42)
--   Rows Removed by Filter: 999953
-- The CTE read the ENTIRE table. The WHERE clause was NOT pushed down.
Subquery or PG12+ inlined CTE β€” optimized
-- Option 1: Use a subquery
SELECT * FROM (
  SELECT * FROM orders
) sub
WHERE user_id = 42;
-- Planner pushes the filter INTO the subquery β†’ Index Scan.

-- Option 2: PG12+ non-materialized CTE (default behavior)
WITH user_orders AS (           -- NOT MATERIALIZED (default in PG12+)
  SELECT * FROM orders
)
SELECT * FROM user_orders
WHERE user_id = 42;
-- PG12+ inlines the CTE β†’ same as subquery β†’ Index Scan.

-- Option 3: Just write the simple query
SELECT * FROM orders WHERE user_id = 42;
-- The simplest is almost always the fastest.
Rule of thumb: Use CTEs for readability when they reference the same result multiple times. For single-use CTEs, a subquery is safer. If you're on PostgreSQL 12+, non-materialized CTEs perform identically to subqueries. Always check with EXPLAIN to verify.
Section 8

Variations β€” Different Databases, Different Tools

Everything so far has been PostgreSQL-focused. But the principles β€” EXPLAIN ANALYZE, indexes, N+1 detection, connection pooling β€” apply to every relational database. The syntax changes. The thinking doesn't. Here are the three most common variations you'll encounter.

Think First

In PostgreSQL, EXPLAIN ANALYZE actually runs the query and shows real execution times. In MySQL, EXPLAIN only shows the plan without executing. What's the trade-off? When would you not want to actually execute the query to get its plan?

Think about a DELETE or UPDATE query. Running it "for real" to see the plan would actually modify data. Also, a query that takes 30 minutes β€” do you want to wait 30 minutes just to see the plan?

PostgreSQL vs MySQL EXPLAIN

Both databases have EXPLAIN, but the output format is different. Here's the same concept in both:

PostgreSQL EXPLAIN
-- PostgreSQL: tree-structured output
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 42;

-- Key output:
-- Index Scan using idx_orders_user_id on orders
--   Index Cond: (user_id = 42)
--   Buffers: shared hit=50
--   Execution Time: 0.128 ms

-- Pro tip: use FORMAT JSON for programmatic parsing:
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE user_id = 42;

-- Or paste your plan into: https://explain.dalibo.com/
-- for a beautiful visual breakdown.
MySQL EXPLAIN
-- MySQL: tabular output
EXPLAIN SELECT * FROM orders WHERE user_id = 42;

-- Key columns to read:
-- +----+------+------+---------+------+------+----------+-------+
-- | id | type | key  | key_len | ref  | rows | filtered | Extra |
-- +----+------+------+---------+------+------+----------+-------+
-- |  1 | ref  | idx_user | 4   | const|   47 |  100.00  | NULL  |
-- +----+------+------+---------+------+------+----------+-------+

-- type = "ref" means index lookup (good).
-- type = "ALL" means full table scan (bad).
-- key = which index was used (NULL = no index used).
-- rows = estimated rows to examine.
-- filtered = what % of rows pass the WHERE filter.

-- MySQL 8.0+: EXPLAIN ANALYZE (like PostgreSQL):
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;
-- Shows actual execution times, not just estimates.

-- MySQL-specific: EXPLAIN FORMAT=TREE (cleaner than table):
EXPLAIN FORMAT=TREE SELECT * FROM orders WHERE user_id = 42;
FeaturePostgreSQLMySQL (InnoDB)
EXPLAIN with actual timingEXPLAIN ANALYZEEXPLAIN ANALYZE (8.0+)
Buffer/page statsBUFFERS optionNot directly available
Slow query loglog_min_duration_statementslow_query_log
Query stats extensionpg_stat_statementsperformance_schema
Partial indexesYes (WHERE clause)No (use generated columns workaround)
Covering index syntaxINCLUDE (cols)Automatic if all cols in index
Online index creationCREATE INDEX CONCURRENTLYALTER TABLE ... ADD INDEX (online by default in 5.6+)
Connection poolerPgBouncerProxySQL / MySQL Router

ORM-Generated SQL β€” Rails, Django, SQLAlchemy

Most web applications don't write raw SQL. They use an ORMObject-Relational Mapper β€” a library that lets you interact with the database using your programming language's objects instead of raw SQL strings. ActiveRecord (Ruby on Rails), Django ORM (Python), SQLAlchemy (Python), and Entity Framework (.NET) are the most popular ones. that generates SQL behind the scenes. The SQL it generates is usually correct but sometimes inefficient. The key skill is knowing how to see the SQL your ORM is generating and how to fix it when it's slow.

Rails β€” see + fix ORM queries
# See the SQL Rails generates:
Order.where(user_id: 42).to_sql
# => "SELECT \"orders\".* FROM \"orders\" WHERE \"orders\".\"user_id\" = 42"

# See EXPLAIN output from Rails:
Order.where(user_id: 42).explain
# => EXPLAIN for: SELECT ...
#    Index Scan using idx_orders_user_id ...

# Add index via migration:
class AddUserIdIndexToOrders < ActiveRecord::Migration[7.1]
  def change
    add_index :orders, :user_id, algorithm: :concurrently
  end
end
# algorithm: :concurrently β†’ non-blocking (requires disable_ddl_transaction!)

# Fix N+1 with includes:
User.includes(:orders).where(active: true).each do |user|
  user.orders.size  # No extra query β€” already loaded
end

# Fix N+1 with joins (when you need to filter):
User.joins(:orders).where(orders: { status: 'pending' }).distinct
Django β€” see + fix ORM queries
# See the SQL Django generates:
qs = Order.objects.filter(user_id=42)
print(qs.query)
# SELECT ... FROM "orders" WHERE "orders"."user_id" = 42

# See all queries executed in a view (debug toolbar):
# pip install django-debug-toolbar
# Shows every SQL query, timing, and duplicates.

# Add index in the model:
class Order(models.Model):
    user_id = models.IntegerField(db_index=True)  # single column
    # Or composite:
    class Meta:
        indexes = [
            models.Index(fields=['user_id', 'status']),
        ]

# Fix N+1 with select_related (ForeignKey) or prefetch_related (M2M):
users = User.objects.prefetch_related('order_set').filter(active=True)
for user in users:
    len(user.order_set.all())  # No extra query

# Use .only() to select specific columns (avoid SELECT *):
Order.objects.filter(user_id=42).only('id', 'total', 'created_at')
SQLAlchemy β€” see + fix ORM queries
# See the SQL SQLAlchemy generates:
from sqlalchemy import create_engine
engine = create_engine("postgresql://...", echo=True)  # echo=True logs all SQL

# Or compile a query object:
stmt = select(Order).where(Order.user_id == 42)
print(stmt.compile(compile_kwargs={"literal_binds": True}))

# Add index:
from sqlalchemy import Index
Index('idx_orders_user_id', Order.user_id)

# Fix N+1 with joinedload or subqueryload:
from sqlalchemy.orm import joinedload
session.query(User).options(
    joinedload(User.orders)  # JOIN in single query
).filter(User.active == True).all()

# Or selectinload (separate SELECT IN query β€” often better):
from sqlalchemy.orm import selectinload
session.query(User).options(
    selectinload(User.orders)  # SELECT * FROM orders WHERE user_id IN (...)
).all()
Universal rule: No matter which ORM you use, always check the SQL it generates for your critical paths. Enable query logging in development. Use N+1 detection tools (bullet for Rails, nplusone for Django). And always add indexes through migrations β€” don't rely on the ORM to create them automatically.

Read Replica Query Routing

Once you've optimized your queries, the next scaling step is splitting read and write traffic. Writes go to the primary database. Reads go to one or more read replicasA copy of your primary database that receives all changes via replication (usually asynchronous). It handles read queries, offloading the primary. The downside: replicas can be slightly behind the primary (replication lag), so you might read slightly stale data.. This is called read/write splitting.

APP SERVER 200 req/sec WRITES (20%) READS (80%) PRIMARY Handles writes only replication REPLICA 1 ~80 reads/sec REPLICA 2 ~80 reads/sec READ LB round-robin Caveat: replication lag (1-100ms) Reads may return slightly stale data
Rails 6+ β€” automatic read/write splitting
# config/database.yml (Rails 6+):
production:
  primary:
    adapter: postgresql
    host: primary-db.internal
    database: myapp
  primary_replica:
    adapter: postgresql
    host: replica-db.internal
    database: myapp
    replica: true

# Rails automatically routes:
# - GET requests β†’ replica
# - POST/PUT/DELETE β†’ primary
# - After a write, reads stick to primary for 2 seconds
#   (to avoid reading stale data from replica lag)

# Manual control:
ActiveRecord::Base.connected_to(role: :reading) do
  Order.where(user_id: 42)  # Guaranteed to hit replica
end

ActiveRecord::Base.connected_to(role: :writing) do
  Order.create!(user_id: 42, total: 99.99)  # Guaranteed to hit primary
end
The replication lag trap: User creates an order (writes to primary). Page redirects to "My Orders" (reads from replica). The replica is 50ms behind. User doesn't see their new order. Panic. The fix: after any write, read from the primary for the next 2-5 seconds. Every major framework handles this, but you need to know it's happening.
Think First

You have a PostgreSQL database with 50M rows in the events table. Your most common query filters by user_id and event_type, and orders by created_at DESC. You only ever query events from the last 90 days. What type of index would you create? Think about: single vs composite, column order, partial index, covering index.

Think about the "leftmost prefix" rule for composite indexes. Think about whether a partial index on created_at > now() - interval '90 days' would help. Would including created_at in the index help with the ORDER BY?

The ideal index for that scenario: a composite partial index. Something like: CREATE INDEX idx_events_user_type_date ON events (user_id, event_type, created_at DESC) WHERE created_at > now() - interval '90 days'; This gives you: the leftmost prefix for filtering (user_id, event_type), the sort order in the index (created_at DESC β€” avoiding a separate sort step), and a partial index that only covers recent data (90% smaller than a full index). If you also INCLUDE the columns your query SELECTs, you get an Index Only Scan β€” the fastest possible path.

Section 9

At Scale β€” How Real Companies Tune SQL

Everything we've covered works great on a single database. But what happens when your product has millions of users and billions of rows? Let's look at four companies that deal with massive SQL workloads every single day β€” and how they keep things fast.

Think First

Stack Overflow serves 1.3 billion page views per month with just 2 SQL Server instances. Meanwhile, Shopify uses hundreds of MySQL shards across Vitess. Both are successful. Why would one company need hundreds of databases while another serves a billion views from two? What's fundamentally different about their workloads?

Think about the read/write ratio and whether queries can be cached. Stack Overflow's content changes rarely β€” the same question page can be cached for minutes. Shopify's checkout flow is unique per merchant, per customer, per cart.
SQL at Massive Scale Stack Overflow 2 SQL Servers 1.3B page views/month Dapper micro-ORM "Fewer servers, faster queries" GitHub MySQL + ProxySQL 10K+ queries/sec Connection pooling layer "Pool connections, route queries" Shopify MySQL + Vitess Query routing + sharding Millions of merchants "Shard per tenant, route at the proxy" Netflix Cassandra + PostgreSQL PG for billing Aggressive PG tuning "Right DB for the right job"

Stack Overflow β€” 1.3 Billion Page Views on 2 SQL Servers

This one blows people's minds. Stack Overflow serves 1.3 billion page views per month β€” and they do it with just two SQL Server instances (a primary and a replica). No distributed database. No sharding. No microservices. Just fast SQL.

Their secret? Dapper β€” a micro-ORM that generates lean, predictable SQL. Unlike heavy ORMs that create 17-table JOINs behind your back, Dapper gives you full control over what SQL runs. Their developers write raw parameterized SQL and use Dapper to map results to C# objects. Every query is hand-tuned, every execution plan is reviewed, and they run EXPLAIN equivalent (SQL Server's execution plans) before any query hits production.

The lesson: You don't need hundreds of servers if your queries are fast. Stack Overflow's entire philosophy is "make the query fast, not the infrastructure big." They've publicly said their biggest performance wins came from fixing N+1 queries and adding missing indexes β€” the exact things we covered in Sections 5 and 6.

GitHub β€” MySQL + ProxySQL for 10K+ Queries/sec

GitHub runs on MySQL β€” yes, the same MySQL you can install on your laptop. But at their scale (100+ million repositories, 10,000+ queries per second), a raw MySQL connection isn't enough. The problem: MySQL has a per-connection cost. If you have 200 app servers each opening 50 connections, that's 10,000 connections hitting your database. Most of those connections are idle most of the time, but MySQL still allocates memory for each one.

Their solution: ProxySQL β€” a lightweight proxy that sits between app servers and MySQL. It pools connections (so 10,000 app connections become ~200 database connections), routes read queries to replicas, and can even cache frequent queries. Think of it as a traffic cop for SQL β€” it makes sure the database never gets overwhelmed, even when thousands of app servers are talking to it simultaneously.

Why this matters for you: Connection pooling isn't just a "big company" thing. If you're running a Rails or Django app with 20+ workers and a single PostgreSQL database, you already need a connection pooler (PgBouncer for PostgreSQL, ProxySQL for MySQL). Without one, every worker holds a persistent connection, and you'll run out of database connections at ~100 concurrent users.

Shopify β€” MySQL + Vitess for Millions of Merchants

Shopify powers millions of online stores. Each merchant has orders, products, customers, and inventory β€” and it all lives in MySQL. But one MySQL instance can't hold data for millions of merchants. Their answer: Vitess, a sharding middleware originally built by YouTube.

Vitess sits between the application and MySQL. It looks like a single database to your application, but behind the scenes it routes queries to the right shard based on the merchant ID. When you run SELECT * FROM orders WHERE shop_id = 12345, Vitess knows that shop 12345 lives on shard 7, and routes the query there β€” without your application code knowing anything about sharding. The app just writes normal SQL.

The tuning challenge? Each shard still needs optimized queries. A slow query on one shard affects every merchant on that shard. So Shopify runs automated slow-query detection across all shards, flagging any query that crosses a latency threshold. Their internal tooling is essentially a scaled-up version of pg_stat_statements β€” tracking the top slow queries per shard and alerting on regressions.

Netflix β€” Cassandra for Scale, PostgreSQL for Money

Netflix is famous for using Cassandra β€” a NoSQL database designed for massive write throughput. But here's what most people don't know: Netflix uses PostgreSQL for billing. When you're handling credit card charges for 250+ million subscribers, you need ACID transactions, not eventual consistency. You need to know that a charge happened exactly once, not "probably once, eventually."

And they tune PostgreSQL aggressively. Their billing database has custom index strategies, partitioned tables (by month β€” old billing records get archived), and hand-optimized queries. They run their own pg_stat_statements dashboards, alert on query plan changes (if the planner suddenly switches from an index scan to a sequential scan, that's a red flag), and test every schema migration against production-scale data before deploying.

The pattern: Use the right database for the right job. Cassandra for high-write, eventually-consistent workloads (viewing history, recommendations). PostgreSQL for low-volume, must-be-correct workloads (billing, payments). And tune both. NoSQL doesn't mean "no tuning."
Section 10

Anti-Lessons β€” "Best Practices" That Backfire

Some SQL tuning advice gets repeated so often that people assume it's true. It's not. These three "best practices" sound reasonable but will actively hurt you in production. Learn to spot them so you don't fall for them on your next project (or in your next interview).

This sounds logical β€” if indexes make reads faster, why not index everything? The problem: every index has a write cost. When you INSERT a row, the database doesn't just write to the table β€” it updates every index on that table. Ten indexes means ten extra writes per INSERT.

Let's put real numbers on this. A table with 10 million rows and 10 indexes will have roughly 5 GB+ of index storage β€” sometimes more than the table data itself. Every INSERT now updates 10 B-tree structures. Every UPDATE that touches an indexed column triggers index maintenance. Your writes slow down by 30-60%, your storage doubles, and VACUUM takes three times as long because it has to clean up dead tuples in every index.

One INSERT β†’ Writes to Every Index INSERT 1 row into orders idx_user_id idx_status idx_created_at idx_total idx_email... +5 more The cost of 10 indexes Table data: ~3.2 GB Index storage: ~5.1 GB (1.6x the data!) Write speed: 30-60% slower per INSERT VACUUM time: 3x longer
The right approach: Only index columns that appear in WHERE, JOIN, or ORDER BY clauses of your actual queries. Use pg_stat_user_indexes to find unused indexes and drop them. A good rule of thumb: 3-5 well-chosen indexes per table is usually plenty.

During development, SELECT * feels harmless β€” you get all the columns, you pick what you need in your application code. But in production, that "convenient" asterisk can be devastating.

Imagine your users table has 20 columns. Your API only needs id, name, and email β€” that's 3 columns. With SELECT *, you're fetching all 20 columns: the bio (a TEXT field averaging 2 KB), the profile_image_url, the settings_json (5 KB blob), and 14 other columns you'll never use. Instead of reading ~100 bytes per row, you're reading ~8 KB per row. That's 80x more data transferred from the database to your application.

It gets worse: with SELECT *, you can't use a covering index. If you had an index on (user_id) INCLUDE (name, email), a query for just those three columns would be an Index Only Scan β€” the fastest possible path, no table access at all. But SELECT * forces the database to hit the heap (the actual table) because no index covers all 20 columns. You throw away the best optimization PostgreSQL has.

Fix it now: Replace SELECT * with explicit column lists in every query that reaches production. Your ORM probably has a method for this β€” .select(:id, :name, :email) in Rails, .only('id', 'name', 'email') in Django, .options(load_only(...)) in SQLAlchemy.

ORMs like ActiveRecord, Django ORM, and SQLAlchemy are fantastic for developer productivity. They generate correct SQL. But "correct" and "optimized" are not the same thing. ORMs optimize for developer convenience, not database performance.

Here's what ORMs typically do that hurts performance:

  • N+1 queries by default β€” lazy-loading related records one at a time instead of batching
  • SELECT * everywhere β€” fetching all columns even when the template only displays two
  • Unnecessary subqueries β€” some ORMs generate nested subqueries where a JOIN would be faster
  • No index awareness β€” the ORM has no idea which indexes exist, so it can't generate queries that leverage them

The fix isn't to abandon ORMs β€” it's to always check the SQL they generate. Every major ORM has a way to log or print the SQL it creates. Make it a habit: write the ORM code, check the SQL, run EXPLAIN, and fix anything that doesn't use an index. The ORM is your co-pilot, not your autopilot.

Real example: A Django app loading a user's dashboard was making 147 queries per page load β€” all generated by the ORM through lazy-loaded relationships. After adding select_related() and prefetch_related(), it dropped to 4 queries. Same page, same data, same ORM β€” just one line added. The page went from 2.3 seconds to 180ms.
Section 11

Common Mistakes β€” The Six That Hit Every Team

These aren't hypothetical. These are the mistakes that show up in every production postmortem, every performance review, every "why is the database slow" Slack thread. If you can avoid all six, you're already ahead of 90% of backend developers.

This is mistake number one for a reason. Developers write a query, test it on a dev database with 500 rows, see it returns in 2ms, and ship it. On production with 15 million rows, that same query takes 4 seconds because it's doing a sequential scan instead of an index scan.

The fix is dead simple: run EXPLAIN ANALYZE on every new query before it hits production. Not just EXPLAIN (which shows the plan) β€” EXPLAIN ANALYZE (which actually runs the query and shows real timing). If you see "Seq Scan" on a large table, stop and add an index. This should be as automatic as running tests before merging a PR.

Pro tip: Add an EXPLAIN step to your CI pipeline. Tools like sqlcheck or custom scripts can flag any new SQL migration that doesn't have a corresponding EXPLAIN output reviewed.

Here's a surprise for PostgreSQL users: when you add a foreign key constraint, PostgreSQL does NOT automatically create an index on the foreign key column. MySQL does (InnoDB requires it). But PostgreSQL silently lets you run without one.

So when you have orders.user_id referencing users.id, every JOIN on orders.user_id does a sequential scan unless you manually create an index. Worse: every time you DELETE a user, PostgreSQL has to scan the orders table to check for referencing rows β€” and without an index, that's a full table scan per delete.

Rule: Every foreign key column needs an index. No exceptions. Check yours right now:

find unindexed foreign keys
-- Find foreign keys without matching indexes
SELECT conrelid::regclass AS table_name,
       a.attname AS fk_column,
       confrelid::regclass AS referenced_table
FROM pg_constraint c
JOIN pg_attribute a ON a.attnum = ANY(c.conkey)
  AND a.attrelid = c.conrelid
WHERE c.contype = 'f'
  AND NOT EXISTS (
    SELECT 1 FROM pg_index i
    WHERE i.indrelid = c.conrelid
      AND a.attnum = ANY(i.indkey)
  );

We covered N+1 in Section 6, but it deserves a spot here because it's the single most common performance bug in web applications. The pattern: you load a list of items, then loop through them and load related data one-by-one. 1 query for the list + N queries for the details = N+1.

The trap is that ORMs make it invisible. You write user.orders in a loop and it looks like a property access β€” but it's actually a SQL query. Every iteration of the loop fires another SELECT * FROM orders WHERE user_id = ?. With 100 users on the page, that's 101 queries instead of 2.

Detection: Use your ORM's query logger. In Rails: ActiveRecord::Base.logger = Logger.new(STDOUT). In Django: django-debug-toolbar. In SQLAlchemy: echo=True. If you see the same query repeated with different IDs, that's an N+1.

Fix: Eager-load the relationship. Every ORM has a method for this β€” includes (Rails), select_related / prefetch_related (Django), joinedload / subqueryload (SQLAlchemy).

Your database has a maximum number of connections it can handle at once. PostgreSQL defaults to max_connections = 100. If your application has 10 workers, each with a pool of 10 connections, that's already 100 β€” the entire limit. Add a background job processor and a cron job, and you're over the limit. New connections get rejected. Your application crashes.

The real danger: during a traffic spike, every request opens a connection and holds it while the slow query runs. Connections pile up, the pool fills, new requests queue, timeouts cascade, and suddenly your entire application is down β€” not because the database is slow, but because it ran out of connections.

Fix: Use an external connection pooler. PgBouncer for PostgreSQL, ProxySQL for MySQL. These sit between your app and the database, multiplexing hundreds of app connections into a small pool of real database connections. PgBouncer in transaction mode can serve 1,000+ app connections through just 20 real database connections.

PostgreSQL uses a system called MVCC (Multi-Version Concurrency Control) β€” when you UPDATE or DELETE a row, it doesn't actually remove the old version. It marks it as "dead" and creates a new version. This is great for concurrent reads, but it means dead rows accumulate over time, making the table (and its indexes) larger than they need to be.

VACUUM cleans up those dead rows. ANALYZE updates the statistics the query planner uses to pick execution plans. If you don't run them, two things happen: (1) your table bloats β€” a 2 GB table becomes 8 GB of mostly dead rows, and (2) the query planner makes bad decisions because its statistics are stale.

PostgreSQL has autovacuum built in, and it's on by default. But on high-write tables (anything with thousands of UPDATEs per minute), autovacuum can fall behind. Check its activity:

check autovacuum status
-- Check when VACUUM and ANALYZE last ran
SELECT relname, n_dead_tup, last_vacuum, last_autovacuum,
       last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

Tables bloat, and so do indexes. After millions of UPDATEs and DELETEs, a B-tree index can have large sections of empty or nearly-empty pages. A 500 MB index might only have 200 MB of actual data β€” the rest is wasted space that the database still has to read through during scans.

Check your index sizes and compare them to what you'd expect:

check index bloat
-- Check index sizes for a specific table
SELECT pg_size_pretty(pg_indexes_size('orders')) AS total_index_size;

-- See individual indexes and their sizes
SELECT indexrelname AS index_name,
       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
       idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY pg_relation_size(indexrelid) DESC;

If an index is much larger than expected, rebuild it with REINDEX CONCURRENTLY (PostgreSQL 12+). This rebuilds the index without locking the table. For older versions, use CREATE INDEX CONCURRENTLY with a new name, then drop the old one.

Section 12

Interview Playbook β€” "This Query Is Slow. How Do You Debug It?"

This is the most common database question in system design and backend interviews. It sounds simple, but the depth of your answer reveals your experience level instantly. Here's what interviewers expect at each level β€” with sample answers you can adapt.

Slow Query Debugging β€” The Mental Framework 1. Reproduce Find the exact SQL 2. EXPLAIN Read the query plan 3. Fix Index / rewrite / cache 4. Verify EXPLAIN again + load test 5. Monitor pg_stat_statements The key insight: don't guess. Measure, diagnose with EXPLAIN, fix with evidence, verify the fix. Interviewers want to see a systematic approach, not "add an index and hope."

What they expect you to know:

  • How to find the slow query (slow query log, application logs)
  • What EXPLAIN does and how to read the basics (Seq Scan = bad, Index Scan = good)
  • How to add an index and verify it worked
  • What an N+1 query is and how to fix it with eager loading

Sample answer: "First, I'd identify the exact SQL query that's slow β€” either from the application logs or the slow query log. Then I'd run EXPLAIN ANALYZE on it to see the execution plan. I'm looking for sequential scans on large tables β€” that usually means a missing index. If the WHERE clause filters by a column like user_id, I'd add a B-tree index on that column and run EXPLAIN again to confirm it switched to an Index Scan. I'd also check if it's an N+1 problem β€” if the same query runs hundreds of times with different IDs, I'd fix it with eager loading in the ORM."

What they expect you to know (everything from Junior, plus):

  • Query plan analysis β€” reading nested loops vs hash joins, understanding actual vs estimated rows, spotting the "rows removed by filter" anti-pattern
  • Index selection β€” composite indexes vs single-column, the leftmost prefix rule, partial indexes for time-scoped data, covering indexes with INCLUDE
  • pg_stat_statements β€” how to find the top 10 slowest queries by total time, not just single execution time
  • Connection pooling β€” why you need PgBouncer/ProxySQL and when pool exhaustion causes slowdowns

Sample answer (add to Junior): "Beyond the basic index, I'd look at the full query plan. If I see a Nested Loop with a high row count, maybe a Hash Join would be better β€” sometimes adding SET enable_nestloop = off in a test session reveals a faster plan. For composite queries, I'd consider a multi-column index with the right column order β€” most selective column first. I'd also check pg_stat_statements to see if this query is a one-off or if it's the top consumer of total database time. Sometimes the slowest single query isn't the biggest problem β€” it's the medium-slow query that runs 10,000 times per minute."

What they expect you to know (everything from Mid, plus):

  • Query plan analysis at depth β€” understanding parallel query execution, bitmap heap scans vs index scans, cost model parameters (random_page_cost, effective_cache_size), and when the planner makes wrong choices
  • Partitioning strategy β€” range partitioning for time-series data, hash partitioning for even distribution, partition pruning in query plans
  • Read replica routing β€” directing read queries to replicas, handling replication lag, write-after-read consistency
  • Caching layer β€” when to put a Redis/Memcached layer in front of the database, cache invalidation strategies, read-through vs write-through
  • Schema-level optimizations β€” denormalization trade-offs, materialized views for complex aggregations, pre-computed rollup tables

Sample answer (add to Mid): "At senior scale, I'd take a broader view. First: is this query even hitting the right database? Read queries should go to replicas. If the query is an aggregation (dashboard, report), maybe it shouldn't hit the OLTP database at all β€” a materialized view refreshed every 5 minutes might be the right answer. For tables over 100M rows, I'd evaluate partitioning β€” range partition by month for time-series data so the planner only scans the relevant partitions. I'd also look at the caching layer β€” can we cache the result in Redis with a 60-second TTL and avoid the database entirely for repeated queries? And for the long term, I'd set up pg_stat_statements dashboards with alerting on p95 query latency regression, so we catch slow queries before users do."

Interview tip: Start with the basics and say "I can go deeper on any of these β€” query plan analysis, index strategy, caching, or partitioning." Let the interviewer choose. This shows breadth and lets them steer toward what they care about.
Section 13

Exercises β€” Hands-On Practice

Reading about SQL tuning is useful, but running real queries is what makes it stick. These five exercises go from "run one command" to "design an indexing strategy for a real schema." Try each one before opening the hints.

Run EXPLAIN ANALYZE and Interpret the Output Beginner

Spin up a PostgreSQL instance (Docker: docker run -e POSTGRES_PASSWORD=test -p 5432:5432 postgres:16). Create a table with 1 million rows using generate_series. Run a SELECT with a WHERE clause, then run EXPLAIN ANALYZE on it. Identify: Is it a Seq Scan or Index Scan? What's the execution time? How many rows were scanned vs returned?

Create the table: CREATE TABLE test_orders AS SELECT g AS id, (random()*10000)::int AS user_id, now() - (random()*365)::int * interval '1 day' AS created_at FROM generate_series(1, 1000000) g;

Query: EXPLAIN ANALYZE SELECT * FROM test_orders WHERE user_id = 42;

Look for: "Seq Scan" (no index yet), "actual rows" vs "rows planned", and "Execution Time" at the bottom.

Without an index, you'll see Seq Scan on test_orders with ~1,000,000 rows scanned. Add the index: CREATE INDEX idx_test_user_id ON test_orders(user_id); Run EXPLAIN ANALYZE again β€” you should see Index Scan using idx_test_user_id with ~100 rows scanned and execution time dropping from ~150ms to ~0.5ms.

Find and Fix an N+1 Query Intermediate

In your web application (or a sample app), enable query logging. Load a page that displays a list with related data (e.g., users with their orders). Count the queries. If you see the same SELECT repeated with different IDs, that's an N+1. Fix it using eager loading and compare the query count before and after.

Rails: ActiveRecord::Base.logger = Logger.new(STDOUT) then look for repeated queries. Fix with .includes(:orders).

Django: Install django-debug-toolbar and check the SQL panel. Fix with select_related() for FK relationships or prefetch_related() for reverse/M2M.

SQLAlchemy: Set echo=True on the engine. Fix with joinedload() or subqueryload().

Set Up pg_stat_statements and Find Top 5 Slow Queries Intermediate

Enable the pg_stat_statements extension in your PostgreSQL instance. Run a mix of queries (some fast, some slow). Then query pg_stat_statements to find the top 5 queries by total execution time. Which query consumes the most total time? Is it slow per-call or just called a lot?

Enable: add shared_preload_libraries = 'pg_stat_statements' to postgresql.conf and restart. Then: CREATE EXTENSION pg_stat_statements;

Query: SELECT query, calls, total_exec_time, mean_exec_time, rows FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 5;

A query with high total_exec_time but low mean_exec_time runs fast but is called thousands of times β€” optimization here has big cumulative impact.

Benchmark with pgbench: Before and After Index Intermediate

Use pgbench (ships with PostgreSQL) to benchmark a query before and after adding an index. Initialize a test database with pgbench -i -s 100 (10 million rows). Write a custom SQL file with a parameterized SELECT, run it with pgbench -f query.sql -c 10 -T 30, record the TPS. Add an index, run again, compare.

Custom file (query.sql): SELECT * FROM pgbench_accounts WHERE aid = :aid \gset. The :aid is randomized by pgbench.

Before: DROP INDEX IF EXISTS pgbench_accounts_pkey; (careful β€” restore after!)

Run: pgbench -f query.sql -c 10 -T 30 -r mydb. The -r flag shows per-statement latency.

You should see TPS increase by 10-100x after restoring the primary key index.

Design an Index Strategy for a Given Schema Advanced

Given this schema: an e-commerce app with users, orders, order_items, and products tables. The top queries are: (1) find orders by user, sorted by date; (2) find order items for an order; (3) search products by name; (4) aggregate total revenue per day. Design the index strategy. Consider: composite indexes, partial indexes, covering indexes, and when a full-text search index (GIN) beats a B-tree for the product search.

Query 1: CREATE INDEX idx_orders_user_date ON orders (user_id, created_at DESC); β€” composite index matches the filter + sort. Add INCLUDE (total, status) if your SELECT only needs those columns (covering index β†’ Index Only Scan).

Query 2: CREATE INDEX idx_order_items_order ON order_items (order_id); β€” standard FK index. Consider INCLUDE for frequently selected columns.

Query 3: For LIKE '%keyword%', B-tree can't help (leading wildcard). Use: CREATE INDEX idx_products_name_gin ON products USING gin (name gin_trgm_ops); β€” requires pg_trgm extension. Handles substring matches efficiently.

Query 4: Don't index for this β€” create a materialized view: CREATE MATERIALIZED VIEW daily_revenue AS SELECT date_trunc('day', created_at) AS day, SUM(total) AS revenue FROM orders GROUP BY 1; Refresh it on a schedule. The dashboard query hits the materialized view, not the 50M-row orders table.

Section 14

Cheat Sheet β€” Bookmark This Section

Six cards with everything you need when you're debugging a slow query, tuning a database, or prepping for an interview. All real commands β€” copy-paste ready.

EXPLAIN Cheat Sheet
-- Basic plan (estimated)
EXPLAIN SELECT ...;

-- Actual timing + buffers
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- JSON for tools
EXPLAIN (ANALYZE, FORMAT JSON) SELECT ...;

-- Visual: explain.dalibo.com
-- Key: look for Seq Scan on large tables
-- Key: actual rows ≫ estimated = stale stats
Scan Types β€” Good vs Bad
Index Scan        β†’ best for selective queries
Index Only Scan   β†’ even better (no table hit)
Bitmap Index Scan β†’ good for medium selectivity
Seq Scan (small)  β†’ fine on <10K rows
Seq Scan (large)  β†’ RED FLAG β€” add an index

Nested Loop  β†’ good for small inner table
Hash Join    β†’ good for large equi-joins
Merge Join   β†’ good when both sides sorted
Index Rules of Thumb
WHERE col = ?    β†’ B-tree on col
WHERE a = ? AND b = ? β†’ composite (a, b)
ORDER BY col DESC β†’ include col in index
LIKE 'abc%'      β†’ B-tree works
LIKE '%abc%'     β†’ GIN + pg_trgm
Partial:  WHERE active = true
Covering: INCLUDE (col1, col2)

Leftmost prefix rule:
  idx(a, b, c) supports:
  WHERE a = ?  βœ“
  WHERE a = ? AND b = ?  βœ“
  WHERE b = ?  βœ— (can't skip a)
pg_stat_statements Queries
-- Top 5 by total time
SELECT query, calls,
  total_exec_time::int AS total_ms,
  mean_exec_time::int AS avg_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;

-- Top 5 by call count
...ORDER BY calls DESC LIMIT 5;

-- Reset stats
SELECT pg_stat_statements_reset();
PgBouncer Config
# pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
reserve_pool_size = 5
reserve_pool_timeout = 3

# transaction mode = best for web apps
# session mode = needed for prepared stmts
# statement mode = most aggressive pooling
Slow Query Detection
# postgresql.conf
log_min_duration_statement = 200
# Logs any query slower than 200ms

# Find currently running slow queries:
SELECT pid, now() - query_start AS duration,
  query, state
FROM pg_stat_activity
WHERE state != 'idle'
  AND now() - query_start > '1 second'::interval
ORDER BY duration DESC;

# Kill a stuck query:
SELECT pg_cancel_backend(pid);
Section 15

Connected Topics β€” Where SQL Tuning Leads Next

SQL tuning doesn't live in isolation. It connects to almost every part of backend engineering β€” from how you design schemas to how you scale entire systems. Here's how the skills you've learned here map to the bigger picture.

SQL Tuning EXPLAIN β†’ Index β†’ Pool β†’ Scale Schema Design "Good schema = less tuning" ACID vs BASE "When to abandon SQL" Caching "Skip the query entirely" Scalability "Replicas, sharding, partitioning" Isolation Levels "Locking affects query speed" Performance "SQL is often the real bottleneck"