Elasticsearch

Section 1

TL;DR — Elasticsearch in Plain English

Why inverted indexes make full-text search orders of magnitude faster than SQL LIKE queries
How Elasticsearch distributes data across shards and replicas to scale horizontally
Where Elasticsearch fits in the ELK/EFK stack for log analytics and observability
The critical trade-off: Elasticsearch is a secondary index layer, not a primary store

Elasticsearch's core insight: pre-compute an inverted index — a map from every word to every document containing it — so search at query time becomes "look up word → get matching doc IDs" instead of "scan every row." Add a distributed cluster on top and you get petabyte-scale search in under 50 ms.

What it is: Elasticsearch is a Lucene-based distributed search engine you talk to over HTTP/JSON. You index documents (push JSON in), and it builds an inverted index that makes full-text queries — including typo-tolerant, ranked, and aggregated ones — return results in tens of milliseconds regardless of dataset size.

Where it fits: The "E" in ELK (or EFK with Fluentd). Beats/Logstash ingest raw data, Elasticsearch stores and indexes it, Kibana visualises it. Use-cases: e-commerce search, autocomplete, log aggregation, security analytics (SIEM), application performance monitoring (APM), and any problem that involves "find me documents matching this text query, ranked by relevance."

The hard trade-off: Elasticsearch gives you eventual consistency, no ACID transactions, no joins, and no referential integrity. It is designed to be a secondary index alongside a primary database (PostgreSQL, MySQL, MongoDB). Treat it like a very powerful search cache: you write to your primary store first, then sync to Elasticsearch. Delete the Elasticsearch cluster and your data must still be safe in your primary store.

Elasticsearch wraps Apache Lucene in a distributed REST API — giving you inverted-index-powered full-text search, shard-based horizontal scaling, and near-real-time ingestion; use it as a secondary search layer, not a primary store.

Section 2

Why You Need This — The Problem SQL Can't Solve

Imagine you're building the search bar for an e-commerce store with 5 million products. A user types "iphone case bllue" — a typo, a colour, a product type, in no particular order. What happens with each approach?

The SQL attempt

A typical SQL query would look something like this:

search.sql

SELECT * FROM products
WHERE name LIKE '%iphone%'
  AND name LIKE '%case%'
  AND name LIKE '%bllue%';   -- typo: zero results

Three problems compound immediately:

Full table scan — LIKE '%word%' cannot use a B-tree index (because the wildcard is at the start). The database engine reads all 5 million rows. At 200 bytes per row that's roughly 1 GB of I/O on every keystroke.
No typo tolerance — "bllue" ≠ "blue" in SQL. Zero results. User leaves frustrated.
No relevance ranking — SQL returns rows in heap order. The most relevant product is not necessarily first. You'd have to add a brittle ORDER BY heuristic.

The Elasticsearch version

With Elasticsearch the same query runs as a fuzzy match against a pre-built inverted index:

search.json

{
  "query": {
    "multi_match": {
      "query": "iphone case bllue",
      "fields": ["name^3", "description", "tags"],
      "fuzziness": "AUTO"
    }
  }
}

What happens under the hood: Elasticsearch tokenizes the query, looks up each token in the inverted index in microseconds, uses fuzzy matching to map "bllue" → "blue" (edit distance 1), intersects the posting lists for all three terms, scores each result using BM25, and returns the top 10 ranked results — all in under 50 ms, even at 5 million documents.

Think First: Your application has 100 GB of nginx access logs spread across 30 days. You need to find all requests that logged ERROR, between 14:00 and 16:00, from IP ranges in the 192.168.x.x subnet. Would you reach for SQL or Elasticsearch — and why? Think about it before reading on. (Hint: think about what "grep at scale" costs, and where the data lives vs. where you query it.)

The answer is almost always Elasticsearch for this pattern. With SQL you'd either need an enormous indexed table (expensive to write to at log-ingestion rates) or you'd be grepping compressed files one by one. Elasticsearch gives you a dedicated time-series index per day, full-text + range filters, and aggregation (e.g. "count errors by hour") — all in a single query.

SQL's LIKE operator can't use indexes on leading wildcards, has no typo tolerance, and returns no relevance ranking — Elasticsearch's inverted index solves all three problems and handles full-text search at millions of documents in under 50 ms.

Section 3

Mental Model — The Inverted Index

Before you dive into clusters and shards, you need to understand the one idea that makes all of Elasticsearch possible: the inverted index. Everything else is engineering built on top of this single concept.

Forward index vs. inverted index

A forward index works the way a normal database does: given a row ID, you can look up its content. An inverted index flips that relationship: given a word, you get back every row that contains it. It's called "inverted" because the direction of lookup is reversed.

Think of it like the index at the back of a textbook. A forward index would be the table of contents ("page 42 covers TCP"). An inverted index is the back-of-book index ("TCP — pages 42, 87, 134"). When you search for "TCP," you go straight to pages 42, 87, 134 — you don't read the whole book.

Why this matters so much

The inverted index is pre-computed at write time. When you index a document, Elasticsearch does the expensive work (tokenize, lowercase, stem, build posting list entries) and stores the result. At query time the work is minimal: look up the term in the index, get back a list of doc IDs, intersect lists for multi-term queries, score, return. That's why search can be sub-50 ms even with billions of documents.

Four design heuristics that follow from this

Documents are JSON
The unit of storage is a document. Flexible, schema-optional (mappings can be inferred), and the thing that gets indexed into posting lists.

Index = collection of documents
An index is roughly equivalent to a SQL table. In practice you'd have one index per data type (e.g., products, logs-2026-05).

Sharded by default
Every index is split across multiple shards. Each shard holds a subset of documents. Shards can be spread across nodes for parallel query execution and horizontal scaling.

Replicated by default
Each shard has one or more replicas. Replicas live on different nodes so if one node dies, no data is lost and queries continue against the replica.

An inverted index pre-computes a word-to-document map at write time so query time is just two hash lookups and a list intersection — essentially constant time regardless of how many documents exist.

Section 4

Core Concepts — The Six Building Blocks

Elasticsearch has its own vocabulary. Six concepts cover 90% of what you need to know to understand how it works and how to talk about it in interviews. Each one maps loosely to something you probably already know from SQL — but with important differences worth understanding.

Document — The Unit of Everything

A document is a single JSON object. It's the unit Elasticsearch stores, indexes, and returns. Think of it as a row in a database — but schema-flexible. Each document belongs to an index and gets a unique _id.

document.json

{
  "_id": "prod-001",
  "_index": "products",
  "_source": {
    "name": "iPhone 15 Pro Case Blue",
    "price": 29.99,
    "category": "accessories",
    "tags": ["iphone", "case", "blue"],
    "created_at": "2026-01-15T10:30:00Z"
  }
}

Why JSON? Because it's self-describing. You can nest objects, arrays, and mixed types without declaring a schema upfront. Elasticsearch will infer types on first document — though you should always define mappings explicitly in production.

Index — The Logical Container

An index is a logical group of related documents — the rough equivalent of a SQL table. Every document lives inside an index. Index names are lowercase strings (e.g., products, logs-2026-05-09).

One key difference from SQL tables: you can have index templates that auto-configure new indexes matching a pattern. For log data you'd typically create one index per day (logs-2026-05-09) and use an index template to manage mappings, replicas, and lifecycle policies automatically.

Shard — The Physical Storage Unit

A shard is a self-contained Lucene index. When you create an ES index with 5 shards, you're creating 5 independent inverted indexes that together hold all the documents.

Why shard at all? Because a single machine can only hold so much data and serve so many queries. By splitting documents across 5 shards (each on a different node), you get 5× the storage capacity and 5× the parallel query throughput. The coordinating node fans a query out to all shards, collects partial results, merges and ranks them, and returns the final answer. The caller sees one unified result set.

A practical soft rule: aim for shards in the 10–50 GB range. Too small (lots of tiny shards) wastes overhead; too large makes recovery slow after a node failure.

Replica — The Safety Net

A replica is an exact copy of a primary shard, placed on a different node. Replicas serve two purposes: fault tolerance (if the primary node dies, a replica is promoted to primary automatically) and read throughput (search queries can be served by any replica, distributing the load).

The default is 1 replica per shard. For production workloads, most teams use 1–2 replicas. Note: replicas increase storage cost — 1 replica means double the storage, 2 replicas means triple.

Mapping — The Schema Definition

A mapping is Elasticsearch's equivalent of a SQL schema. It defines what type each field is: text (full-text analyzed), keyword (exact match), integer, float, date, geo_point, boolean, etc.

The type matters enormously. A text field is run through an analyzer at index time — it becomes tokens. A keyword field is stored as-is. If you accidentally map a product name as keyword instead of text, searches for partial words will return nothing. If you map a category as text instead of keyword, you can't do exact-match filtering or aggregations.

Elasticsearch can auto-detect types (dynamic mapping), but in production you should always define explicit mappings to avoid surprises.

Analyzer — The Text Processing Pipeline

An analyzer is a pipeline that transforms raw text into tokens before storing them in the inverted index. The same analyzer is applied at index time (when you add a doc) and query time (when you search). If they don't match, searches won't find results that should match.

A standard analyzer on "Running Shoes for Kids!" produces: ["running", "shoes", "for", "kids"] — lowercased, punctuation stripped, split on whitespace. A stemming analyzer goes further: ["run", "shoe", "kid"]. That way searching "run" also matches documents about "running" or "ran."

Elasticsearch's six building blocks — Document, Index, Shard, Replica, Mapping, Analyzer — map roughly to SQL rows, tables, partitions, backups, schemas, and text processors, but with important distributed and search-specific differences.

Section 5

Lucene Under the Hood — Segments, Refresh & Merge

Elasticsearch is fundamentally a distributed wrapper around Apache Lucene. Understanding what Lucene does underneath — and why — explains a lot of Elasticsearch behavior that would otherwise seem arbitrary: why new documents aren't immediately searchable, why deletes are slow, why disk usage grows over time, and why "merge" operations matter for performance.

The Shard → Segments relationship

Each Elasticsearch shard is a Lucene index. And each Lucene index is actually composed of multiple segments. A segment is a small, immutable inverted index. When you index new documents, Lucene doesn't update existing segments — it creates a new one. This immutability is a key design choice: immutable data structures are trivially thread-safe and can be memory-mapped by the OS for near-instant access.

Three key behaviors that follow from this design

Refresh Interval — Why "Near Real-Time" Not "Real-Time"

By default Elasticsearch refreshes every 1 second — meaning a document you just indexed will appear in search results within about 1 second, not instantly. The reason is performance: flushing to a new segment on every single write would create thousands of tiny segments. Batching writes into a 1-second window means far fewer, more efficient segments.

You can reduce the refresh interval for near-real-time requirements, or increase it (e.g., to 30 seconds or -1 to disable) during bulk imports to dramatically speed up indexing. A bulk import with refresh_interval: -1 and a manual refresh at the end is typically 2–5× faster than the default.

Segments Are Immutable — Why Updates Are "Delete + Re-Index"

Segments cannot be modified once written — immutability is what makes them safe to read concurrently from multiple threads with no locking. So when you update a document, Elasticsearch doesn't edit the old segment. It writes a new version of the document into a new segment and marks the old version with a tombstone.

The tombstoned version is excluded from search results immediately, but it still occupies disk space. Only when a background merge combines segments does it get physically deleted. This is why heavy delete workloads can temporarily inflate disk usage.

Merge Policy — Keeping Segment Count Manageable

Left unchecked, every 1-second refresh creates a new segment. After a day of active indexing you'd have 86,400 tiny segments. Opening that many files would crush performance. The merge policy runs in the background, continuously combining small segments into larger ones.

Merges consume I/O and CPU. On very write-heavy clusters you'll see merge pressure — the cluster struggles to merge fast enough, leading to thread-pool rejections. Monitoring merge stats (_cat/nodes?v&h=merges.current) is part of ES ops. A 100 GB index in production may have on the order of a few dozen active segments after steady-state merging.

Each Elasticsearch shard is a collection of immutable Lucene segments — new documents go into an in-memory buffer, become searchable after a ~1-second refresh flush, and tombstoned deletes are physically purged only during background segment merges.

Section 6

Mapping & Analyzers — Teaching Elasticsearch Your Language

You can have the perfect cluster setup, but if your mappings are wrong, searches return nothing for queries that obviously should match — or return too much noise. Mapping and analyzers are the "language layer" of Elasticsearch: they define how text is interpreted, stored, and matched. Getting them right is the difference between a search feature users love and one they ignore.

What a mapping tells Elasticsearch

A mapping declares what data type each field holds and how it should be indexed. The most important distinction is between text and keyword:

text — the field is analyzed (tokenized, lowercased, stemmed). Full-text search. Searching "blue" matches "Blue", "blues", "BLUE". Can NOT be used for exact-match aggregations.
keyword — the field is stored as-is, one token. Good for filtering, sorting, aggregations. category: "accessories" must match exactly — searching "Accessories" (capital A) returns nothing.

Other common field types: integer, float, double, date (ISO 8601 or epoch ms), boolean, geo_point (lat/lon for location search), nested (for arrays of objects that need independent querying).

The four most common analyzer types

standard

The default. Splits on Unicode word boundaries, lowercases, removes punctuation. Works well for most English content. Doesn't stem — "running" and "run" are different tokens.

Use when: general full-text search where you want consistent, predictable tokenization without stemming surprises.

keyword

No analysis — the entire field value is one token. "iPhone 15 Pro" stays as exactly "iPhone 15 Pro." Used for fields you'll filter, sort, or aggregate on — SKUs, categories, status codes, email addresses.

Use when: exact-match filtering, aggregations (counts by category), or sorting. Never use for free-text search — searching "iphone" won't match "iPhone 15 Pro" because the tokens don't match.

ngram / edge_ngram

Generates substrings of each token. "foo" → ["f","fo","foo"] (ngram) or leading substrings (edge_ngram). This is how autocomplete / "search as you type" works: as the user types "ipho," the edge_ngram index already has a token "ipho" → Doc42.

Use when: autocomplete, prefix search, or typo tolerance (ngrams at larger min/max values cover near-typos). Trades index size (it's much larger) for query speed.

language-specific

ES ships analyzers for 30+ languages: english, spanish, french, german, etc. They use language-aware stemming (English: "running"→"run", Spanish: "corriendo"→"corr"), remove language-appropriate stop words ("the","a" in English; "el","la" in Spanish), and handle language quirks.

Use when: you know the content language. The english analyzer produces dramatically better search quality than standard for English text.

Mapping and analyzer code examples

This mapping defines a products index with fields for both analyzed text search and exact-match keyword fields. The text type on name and description enables full-text search. The keyword type on category and sku enables exact filtering and aggregations.

PUT /products

{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "english"
      },
      "description": {
        "type": "text",
        "analyzer": "standard"
      },
      "category": {
        "type": "keyword"
      },
      "sku": {
        "type": "keyword"
      },
      "price": {
        "type": "float"
      },
      "created_at": {
        "type": "date",
        "format": "strict_date_optional_time"
      },
      "in_stock": {
        "type": "boolean"
      }
    }
  }
}

Why english on name but standard on description? The name field benefits from stemming ("running shoes" → matches "run" and "shoe"). The description field is longer and stemming can sometimes over-match, so standard is a safer conservative choice for longer content.

For autocomplete you need an edge_ngram analyzer. It indexes only the leading substrings of each token — so typing "ipho" already matches "iphone" in the index. We use a custom analyzer defined in the index settings, then reference it in the mapping.

PUT /products-autocomplete

{
  "settings": {
    "analysis": {
      "tokenizer": {
        "edge_ngram_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 15,
          "token_chars": ["letter", "digit"]
        }
      },
      "analyzer": {
        "autocomplete_index": {
          "type": "custom",
          "tokenizer": "edge_ngram_tokenizer",
          "filter": ["lowercase"]
        },
        "autocomplete_search": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "autocomplete_index",
        "search_analyzer": "autocomplete_search"
      }
    }
  }
}

Notice the split: analyzer (used at index time) generates all the ngrams; search_analyzer (used at query time) is the plain standard tokenizer. You don't want to ngram-expand the search term — you just want to match the user's raw input against all the pre-computed ngram tokens.

A common pattern: you want both full-text search (user types partial words) AND exact-match aggregation (facets, counts by category) on the same field. Elasticsearch supports this via multi-fields — one logical field with multiple index representations.

PUT /products-multifield

{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "english",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          },
          "suggest": {
            "type": "text",
            "analyzer": "autocomplete_index",
            "search_analyzer": "autocomplete_search"
          }
        }
      }
    }
  }
}

// Query full-text search:  { "match": { "name": "running shoes" } }
// Exact-match aggregation: { "terms": { "field": "name.keyword" } }
// Autocomplete:            { "match": { "name.suggest": "runn" } }

The name field stores the original value once but generates three different index entries: one analyzed for full-text search, one keyword for exact/aggregation, one edge_ngram for autocomplete. Storage cost is higher (three inverted indexes for one field) but search flexibility is maximised.

Mappings define field types (text for analyzed full-text, keyword for exact-match); analyzers define how text is tokenized, lowercased, and stemmed — getting this layer right is what makes searches feel intelligent rather than brittle.

Section 7

Cluster Architecture

Think of an Elasticsearch cluster like a company with specialised employees. Some people run meetings and make big decisions (master nodes), some people store and retrieve files in the warehouse (data nodes), some people stand at the front desk and direct visitors (coordinating nodes), and some people process incoming packages before they hit the shelves (ingest nodes). Nobody does everything — specialisation is what lets the company scale.

Every node in the cluster knows about every other node through a process called cluster state gossip. But there is always exactly one active master node (elected from the master-eligible pool) that is the single authority for the cluster's global state: which indexes exist, how many shards they have, which node holds which shard, and what the current mappings look like. If that master disappears, the remaining master-eligible nodes hold a quick election and pick a new one — usually in seconds.

The 5 Node Roles

Every node can wear one or more "hats". In small clusters a single node often plays all roles. In large production clusters you separate them — because the work they do is fundamentally different and competing for the same CPU/memory causes problems.

Master-eligible

Participates in the leader election. At any moment exactly one is the active master, which owns and distributes the cluster state: index settings, mapping definitions, shard routing table. The others are warm standbys ready to take over. You want an odd number (3 or 5) so a quorum (majority) can always be formed — this prevents split-brain, where two isolated halves of the cluster both think they are the master.

Data

The workhorses. They store the actual Lucene segments (shards) on disk and handle the indexing and searching work. A data node with high disk I/O can lag cluster elections if it is also a master-eligible node — which is exactly why dedicated masters exist.

Coordinating-only

A "dumb router." It accepts a search request, figures out which shards hold relevant data, fans the request out to those shards, collects partial results, merges and sorts them, then returns the final response. It stores nothing. Useful when your application sends huge aggregation queries that require heavy in-memory merging — offload that work to coordinating nodes so data nodes stay focused on disk I/O.

Ingest

Runs ingest pipelines — ordered processors that transform documents before they are indexed. Common examples: parse a timestamp, look up an IP address's geolocation, rename a field, convert a string to lowercase. Logstash does similar work but runs as a separate process; ingest nodes bake that capability directly into Elasticsearch.

Machine Learning

Part of the commercial Elastic Stack. Runs anomaly detection jobs (e.g. "alert when request latency spikes 3 standard deviations above baseline"). Separated from data nodes so ML CPU spikes do not slow down normal searches.

An Elasticsearch cluster is a team of specialised nodes — one elected master owns cluster state, data nodes hold shards, and coordinating nodes route queries — so each role can scale independently without competing for the same resources.

Section 8

Sharding & Replication

Imagine your index as a big filing cabinet. Sharding is the act of cutting that cabinet into equal-sized drawers and spreading the drawers across multiple rooms (nodes). Replication is making photocopies of each drawer and storing those copies in different rooms so that if one room burns down you still have the data — and while you're at it, the copies also answer read requests, doubling or tripling your read throughput.

When you create an index you declare number_of_shards (how many drawers) and number_of_replicas (how many copies of each drawer). The key constraint: shard count is fixed at creation time. Elasticsearch uses a deterministic hash of the document's _id field to decide which shard it belongs to — change the shard count and all those hash values are wrong, so you would need to reindex from scratch.

4 Design Rules for Sharding

Shard count is fixed at index creation

Elasticsearch routes documents via shard = hash(_id) % number_of_shards. Change that denominator and every existing document routes to the wrong shard. So the number of primary shards is baked in forever. If you get it badly wrong you must reindex — create a new index with the right shard count and copy all data across. Elasticsearch 7.x+ changed the default from 5 to 1 primary shard to discourage over-sharding on small indexes.

Replica count is flexible

Unlike primaries, replicas can be increased or decreased at any time with a simple settings update. Need more read throughput? Bump replicas from 1 to 2. Need to free up disk? Drop to 0. Elasticsearch rebalances automatically. This makes replicas a useful operational lever for handling traffic spikes without downtime.

Routing keeps related documents together

By default every document goes to a shard based on its _id hash, which scatters documents randomly across shards. But you can pass a custom routing parameter at index time: POST /orders/_doc?routing=customer_42. All of customer 42's orders land on the same shard, so a query filtered to customer 42 only hits one shard instead of all of them — much faster for tenant-scoped queries.

Shard size sweet spot: 10–50 GB

Shards that are too small waste overhead — each shard is its own Lucene instance with open file handles, memory maps, and merge threads. Shards that are too large slow recovery because when a node restarts it must re-replicate or rebuild entire shards over the network. The community-accepted guideline is roughly 10–50 GB per shard, with most teams aiming around 20–30 GB for a comfortable middle ground.

Too many small shards kills your master. Every shard — primary and replica — lives in the master's cluster state. If you have 10,000 shards (common with per-day time-series indexes that were never cleaned up), the master spends its entire life broadcasting a massive cluster state to every node. Cluster operations slow to a crawl and recovery after a node failure becomes painfully slow. A common production mistake is creating a daily index with 5 shards and forgetting to clean up — after two years you have 3,650+ indexes and 18,000+ shards. Use Index Lifecycle Management (ILM) and keep total cluster shard count under a few thousand.

Primary shards split your index at creation (fixed count), replicas are copies on different nodes for fault tolerance and read scale — and the 10–50 GB per shard rule keeps cluster state manageable.

Section 9

The Query DSL

You talk to Elasticsearch via JSON. Every search is a JSON object describing what you want to find, how you want it scored, and which fields to return. Elasticsearch calls this the Query DSL (Domain-Specific Language) — it's not SQL, but once you get the mental model it feels equally expressive and much more powerful for search-specific needs.

The single most important concept in the Query DSL is the difference between a query context and a filter context. In a query context, Elasticsearch computes a relevance score — "how well does this document match?" In a filter context, it answers a binary yes/no question — "does this document match?" — and crucially, filters are cached. A filtered result set is held in a bitset in memory and reused for subsequent requests. That is why filters are dramatically faster when you do not need ranking.

5 Core Query Types

match

The go-to for full-text search. The query text is run through the same analyzer as the indexed field — it gets lowercased, tokenised, stemmed, stop-words removed. Then Elasticsearch looks up each token in the inverted index. match: { "title": "running shoes" } finds docs mentioning "run", "shoe", "shoes", "running" because the standard analyzer does stemming.

term

Exact value search — no analysis applied. Used for structured data: IDs, status codes, tags. term: { "status": "published" } only matches documents where status is the exact string "published". If the field was analysed at index time and stored as "publish" after stemming, a term query for "published" would miss it. This is a common gotcha.

range

Numerical and date ranges. range: { "price": { "gte": 10, "lte": 50 } }. Works on any numeric or date field. Combine with filter context so the range check is cached — otherwise every query re-evaluates the range condition from scratch.

bool

The combininator. Four clauses: must (document must match — AND logic, contributes to score), should (match is preferred but not required — OR logic, boosts score if matched), filter (must match — AND logic, does NOT affect score, IS cached), must_not (must not match — NAND logic, does not affect score, IS cached). You nest these freely to build arbitrarily complex conditions.

multi_match

Like match but across multiple fields at once. multi_match: { query: "elasticsearch", fields: ["title^2", "body"] } — the ^2 boosts title matches to twice the weight of body matches. Useful for e-commerce where a match in the product name matters more than a match buried in the description.

Query Examples

A basic full-text search. The query text "running shoes" is analyzed — split into tokens, stemmed — then looked up in the inverted index of the title field.

GET /products/_search

{
  "query": {
    "match": {
      "title": "running shoes"
    }
  },
  "size": 10,
  "_source": ["title", "price", "brand"]
}

What happens inside: "running shoes" → analyzer → ["run", "shoe"] → inverted index lookup → scored list of matching doc IDs → top 10 returned.

A bool query with a must clause for the full-text match (affects scoring) and a filter clause for a date range (binary gate, cached, does not affect score). This is the correct pattern for "search for a term, but only within the last 30 days."

GET /articles/_search

{
  "query": {
    "bool": {
      "must": [
        { "match": { "body": "elasticsearch tutorial" } }
      ],
      "filter": [
        {
          "range": {
            "published_at": {
              "gte": "now-30d/d",
              "lte": "now/d"
            }
          }
        },
        { "term": { "status": "published" } }
      ]
    }
  }
}

The date range and status term checks are in filter — they will be cached as bitsets after the first execution, so subsequent identical requests skip the evaluation entirely.

Search across multiple fields at once. The ^2 on title means title matches are weighted twice as heavily as description matches when computing the relevance score.

GET /products/_search

{
  "query": {
    "multi_match": {
      "query": "wireless noise cancelling headphones",
      "fields": ["title^2", "description", "brand^1.5"],
      "type": "best_fields",
      "minimum_should_match": "75%"
    }
  }
}

type: "best_fields" takes the score from whichever field matched best (as opposed to most_fields which sums scores across all fields). minimum_should_match: "75%" means at least 3 of the 4 tokens must match — helps filter out marginally relevant results.

Filters are 10–100× faster than queries. The reason is two-fold: (1) filters skip the scoring computation entirely, and (2) Elasticsearch caches filter results as memory-resident bitsets that can be ANDed together in nanoseconds. Every time a user searches with "status=published AND date in last 30 days", those two filter bitsets are already in memory. The text search query runs on top of the pre-filtered document set, not the entire index. Rule of thumb: if you don't need a relevance score for a condition, put it in filter.

The Query DSL is JSON-based; queries (must/should) score documents while filters (filter/must_not) are cached binary gates — use filters for date ranges and status checks to get 10–100× faster results.

Section 10

Aggregations

Search finds documents. Aggregations analyse them. Think of aggregations as Elasticsearch's answer to SQL's GROUP BY — except more powerful because they can be nested (a grouping inside a grouping), pipelined (one aggregation feeding another), and run in parallel across all shards simultaneously. When you look at a Kibana dashboard with a bar chart of "requests per minute by HTTP status code" — that's an aggregation. The entire Kibana visualisation layer is just a UI on top of Elasticsearch aggregations.

4 Aggregation Types

Bucket aggregations — Group documents

Bucket aggs are the GROUP BY equivalent. They put documents into buckets based on a criterion. Each bucket is a group, and you get the count of documents in each plus optionally a sub-aggregation on the group. The most common examples: terms (top N values of a field — "which categories have the most products?"), date_histogram (group by time interval — "how many orders per day this week?"), range (group by value ranges — "0–$50, $50–$200, $200+ price bands").

Metric aggregations — Calculate statistics

Metric aggs calculate a single number (or small set of numbers) over a set of documents or a bucket. Common examples: avg, sum, min, max, value_count, cardinality (approximate distinct count using HyperLogLog — very memory-efficient), and percentiles (p50, p95, p99 latencies). These almost always live inside a bucket agg to compute stats per group.

Pipeline aggregations — Operate on aggregation results

Pipeline aggs take the output of other aggregations as input rather than operating on raw documents. Example: you have a date_histogram with a sum(revenue) per day. A derivative pipeline agg calculates the day-over-day change in that sum. A moving_avg pipeline agg smooths the trend line. They make Elasticsearch capable of time-series analytics that would require several SQL CTEs to express.

Nested aggregations — Aggs inside aggs

Any aggregation can contain sub-aggregations. Terms bucket by category → inside each category, avg price metric. Then inside each category, another date_histogram of orders per day. ES runs all of this in a single request, in parallel across shards. The coordinator merges partial results from each shard. This is what makes Kibana dashboards possible — a dashboard might run a query with 6 nested aggregation levels and get back a full analytics result in under a second.

Aggregation Examples

Find the top 5 categories by document count. This is the Elasticsearch equivalent of SELECT category, COUNT(*) FROM products GROUP BY category ORDER BY COUNT(*) DESC LIMIT 5.

GET /products/_search

{
  "size": 0,
  "aggs": {
    "top_categories": {
      "terms": {
        "field": "category.keyword",
        "size": 5,
        "order": { "_count": "desc" }
      }
    }
  }
}

"size": 0 at the top level means return no raw documents — we only want the aggregation result. category.keyword uses the not-analyzed keyword sub-field for exact grouping (as opposed to the analyzed category text field which would split "Running Shoes" into tokens).

Count events per day over the last 7 days. This is the backbone of every time-series chart in Kibana.

GET /logs-*/_search

{
  "size": 0,
  "query": {
    "range": {
      "@timestamp": { "gte": "now-7d/d", "lte": "now/d" }
    }
  },
  "aggs": {
    "events_per_day": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "day",
        "time_zone": "UTC",
        "min_doc_count": 0
      }
    }
  }
}

min_doc_count: 0 ensures days with zero events still appear as buckets (so the chart line doesn't have gaps). calendar_interval: "day" aligns buckets to calendar day boundaries, handling DST correctly.

For each category, compute the average price. One request, all categories, all averages — equivalent to multiple SQL queries or one complex CTE.

GET /products/_search

{
  "size": 0,
  "aggs": {
    "by_category": {
      "terms": {
        "field": "category.keyword",
        "size": 20
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        },
        "price_percentiles": {
          "percentiles": {
            "field": "price",
            "percents": [50, 95, 99]
          }
        }
      }
    }
  }
}

Each shard computes its own partial by_category term counts and avg_price sums. The coordinating node merges: for avg, it takes sum / count from all shards. For percentiles, Elasticsearch uses the TDigest algorithm — an approximate data structure that computes percentiles in a single pass, mergeably, without storing all values.

Aggregations are Elasticsearch's analytics engine — bucket aggs group documents (like GROUP BY), metric aggs compute stats (avg, percentiles), and they nest freely, running in parallel across shards so a full analytics query returns in milliseconds.

Section 11

Relevance & Scoring

When you type "iphone case leather" into an e-commerce site, you expect the most relevant results first. But what does "most relevant" even mean? Elasticsearch's answer is a number called the relevance score — computed for every matching document, with higher scores appearing first in results. Getting this number right is what separates a search engine that feels magical from one that feels broken.

The scoring algorithm Elasticsearch uses by default is called BM25 (Best Match 25 — the 25th iteration of a research programme called Okapi BM). It replaced the older TF-IDF formula in Elasticsearch 5. The core idea: a term is more significant if it appears often in this particular document (term frequency) but rarely across the whole collection (inverse document frequency). BM25 adds two refinements over TF-IDF: term frequency saturation (the 100th mention of "iphone" barely counts more than the 10th) and field length normalisation (matching "iphone" in a two-word title is stronger than matching it in a 500-word description).

4 Score-Tuning Patterns

Field boosting

By appending ^N to a field name in a multi_match query (or using the boost parameter in a match query), you multiply that field's relevance contribution. title^3 means a match in the title is 3× more valuable than a match in the body. This is one of the fastest things to tune when results feel off — a "title match" is almost always more relevant than a "body match" in a product catalogue or article search.

Fuzzy matching

Users make typos. "fuzziness": "AUTO" allows Elasticsearch to match terms within a certain edit distance — essentially the number of single-character insertions, deletions, or substitutions needed to convert one term to another (Levenshtein distance). "iphne" (typo) still matches "iphone" with edit distance 1. AUTO picks the right threshold based on word length: distance 0 for 1–2 char words, 1 for 3–5 chars, 2 for 6+ chars. Combine with prefix_length: 2 to anchor the first two characters — prevents "bat" matching "cat" via one edit which feels wrong to users.

Function score

Sometimes pure text relevance is not enough. You want to blend in signals like recency (a newer article should rank higher all else being equal), popularity (a product with 10,000 reviews should beat one with 5), or distance (a nearby restaurant ranks above a far one). The function_score query wraps your regular query and multiplies or adds custom functions into the score. The gauss decay function is especially useful for recency — it gives a smooth exponential decay as articles get older, rather than a hard cutoff.

Synonym expansion

A user searching "tv" expects to find documents mentioning "television." Elasticsearch handles this at analysis time with a synonym token filter in the analyzer. You define a synonyms file: tv, television, telly. At query time, "tv" expands to all three terms and the match is OR-ed together. You can also do one-way synonyms: "iphone" → "apple iphone, ios phone" to broaden results without polluting the other direction.

GET /articles/_search

{
  "query": {
    "multi_match": {
      "query": "machine learning tutorial",
      "fields": [
        "title^3",
        "subtitle^2",
        "body^0.5",
        "tags"
      ],
      "type": "best_fields"
    }
  }
}

A match in title is worth 3× a match in body. The 0.5 on body actually reduces its contribution — useful when body is long and noisy but you don't want to exclude body-only matches entirely.

GET /products/_search

{
  "query": {
    "match": {
      "title": {
        "query": "wireles hedphones",
        "fuzziness": "AUTO",
        "prefix_length": 2,
        "minimum_should_match": "2"
      }
    }
  }
}

"wireles" (typo, missing s) still finds "wireless" with edit distance 1. prefix_length: 2 means the first two characters must match exactly — "wi" must match "wi" — which prevents absurd fuzzy matches like "cat" matching "bat."

GET /articles/_search

{
  "query": {
    "function_score": {
      "query": { "match": { "title": "elasticsearch" } },
      "functions": [
        {
          "gauss": {
            "published_at": {
              "origin": "now",
              "scale": "30d",
              "offset": "7d",
              "decay": 0.5
            }
          }
        },
        {
          "field_value_factor": {
            "field": "view_count",
            "modifier": "log1p",
            "factor": 0.1
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "multiply"
    }
  }
}

Two scoring functions: a Gaussian decay on published_at (articles older than 30 days get half the boost) and a log-scaled view count signal. modifier: "log1p" prevents a viral article with a million views from completely dominating a very relevant but less popular one. boost_mode: "multiply" multiplies the text relevance score by the combined function score.

Most "bad search results" problems are scoring or analyzer problems. Before concluding that your data is wrong, run "explain": true on your query. Elasticsearch returns the full BM25 calculation for each result, showing exactly why Document A scored 8.4 and Document B scored 2.1. This almost always reveals the real issue: a field was mapped as text when it should be keyword, an analyzer is stripping terms you need, or you forgot to boost the title field.

BM25 scores documents by rewarding terms that appear often in this document but rarely across the corpus — boost fields, add fuzzy matching for typos, and use function_score to blend in recency or popularity signals.

Section 12

Logs & Observability (ELK Stack)

Imagine you have 500 servers all running your application. Something breaks at 2 a.m. You need to answer: Which servers threw errors? When did it start? What was the error message? Was it correlated with a memory spike? Without a centralised log system you would SSH into 500 machines one by one — that's absurd. The ELK stack (Elasticsearch + Logstash + Kibana) was built to solve exactly this problem, and for over a decade it has been the canonical answer for engineering teams that need to search and visualise logs at scale.

The acronym has expanded over the years. The official name is now the Elastic Stack, with Beats added as a fourth component. But "ELK" still sticks in conversation because it was the original, and "BELK" never caught on.

The 4 Stack Components

Filebeat / Metricbeat (Beats)

Beats are ultra-lightweight agents — they're written in Go and designed to run on every single server in your infrastructure without consuming meaningful CPU or memory. Filebeat tails log files (like a persistent tail -f) and ships new lines to Logstash or directly to Elasticsearch. Metricbeat collects system metrics: CPU, memory, disk, network throughput. The "lightweight" part matters: if your log shipper consumes 500 MB of RAM on every server, you have a problem. Filebeat typically uses under 50 MB.

Logstash

The heavy-duty transformer. Logstash uses a pipeline model: INPUT plugins receive data (from Beats, Kafka, TCP sockets, S3, etc.), FILTER plugins transform it (the famous grok plugin parses unstructured log lines with regex patterns into structured JSON fields), and OUTPUT plugins send the transformed events downstream. Logstash is JVM-based and can be memory-intensive. For simpler transformations, Elasticsearch's built-in ingest pipelines can replace it entirely, saving you a component to operate.

The storage and search core. In the ELK context, logs are stored in time-series indexes — typically one index per day (e.g. logs-2025.05.09). Index Lifecycle Management (ILM) automates the full lifecycle: new indexes start on fast "hot" nodes (NVMe SSDs), age into "warm" nodes (spinning disk, read-only), then into "cold" storage (lower-spec), and finally are deleted or archived. This keeps your storage costs manageable — you're not paying NVMe prices for three-month-old logs you barely query.

Kibana

The human interface. Kibana talks to Elasticsearch over its REST API, issues aggregation queries, and renders the results as charts, tables, heat maps, and geographic maps. The Discover tab lets engineers search raw logs in real time ("show me all ERROR logs from the payment service in the last 15 minutes"). Dashboards are collections of saved visualisations that auto-refresh. Alerting rules fire when aggregation results cross thresholds — e.g. "alert Slack if p99 latency exceeds 500 ms for any service."

Modern Alternatives

The observability space has fragmented significantly, especially after Elastic's 2021 license change from Apache 2.0 to SSPL/Elastic License (restricting cloud providers from offering hosted Elasticsearch as a service). Here is how the landscape looks today:

OpenSearch (Apache 2.0)

Amazon forked Elasticsearch 7.10.2 (the last Apache-licensed version) and created OpenSearch in 2021. It is API-compatible with Elasticsearch for most operations, actively maintained, and fully open source under Apache 2.0. AWS, Google, and others offer managed OpenSearch. If you are starting a new project and want to avoid the Elastic License restrictions, OpenSearch is the most direct drop-in alternative.

Grafana Loki

A fundamentally different model. Instead of indexing the full text of every log line (expensive), Loki only indexes labels (e.g. service=payment, env=prod). Log content is stored compressed. This makes Loki dramatically cheaper at scale but means you can only filter by labels — you cannot do a full-text search across all log content. Best for Kubernetes-native environments where you already have good label discipline. Often paired with Prometheus (metrics) and Tempo (traces) in the Grafana observability stack.

License timeline worth knowing. Elasticsearch was Apache 2.0 until January 2021. Elastic then switched to a dual SSPL/Elastic License, blocking Amazon (and others) from selling it as a managed service. Amazon forked the last open-source version as OpenSearch. In 2024, Elastic added AGPLv3 as a third licensing option alongside SSPL and Elastic License, partially addressing community concerns. When choosing between Elasticsearch and OpenSearch today, the code bases are diverging more every month, but the core Query DSL and API are still largely compatible.

The ELK stack routes logs from every server through lightweight Beats agents → Logstash transformation → Elasticsearch storage/search → Kibana dashboards, with ILM managing retention — and OpenSearch (Apache 2.0 fork) is the main open-source alternative after Elastic's 2021 license change.

Section 13

Index Lifecycle & Retention (ILM)

Log indexes grow without limit unless something automatically cleans them up. Index Lifecycle Management (ILM) defines a series of phases — hot, warm, cold, frozen, delete — and automatically moves data through them based on age or size rules, keeping storage costs sane and queries fast.

Imagine your application logs as a river. Without anything managing that river, water just pools up — one enormous lake (index) that gets bigger every day. Querying that lake gets slower as it grows, and storage costs spiral. ILM is like a series of reservoirs along the river: active water stays in the fast first reservoir, older water flows to slower cheaper ones, and eventually it drains away entirely.

The big win: you configure the rules once and Elasticsearch handles everything automatically. No cron jobs, no manual `DELETE` calls, no overnight maintenance scripts.

The Five ILM Phases

Each phase is a trade-off between speed and cost. You move data to slower, cheaper storage as it ages because recent data is queried constantly while month-old data is almost never touched.

Hot — Active Writes

Data lands here first. The index is accepting new documents every second. Elasticsearch puts hot indices on your fastest hardware (NVMe SSDs) because every write goes through a refresh cycle, and every query on today's logs hits here. Typical duration: 1–7 days.

Warm — Read-Only

Once an index stops receiving writes, it moves to warm. You can still search it freely, but it lives on slower (and cheaper) hardware. A common optimization: force-merge the index to a single Lucene segment, which shrinks its size and speeds up future queries. Typical duration: 7–30 days.

Cold — Rarely Searched

Old data that's legally or operationally required but almost never queried. You often drop replicas here — if a node fails, you just thaw from S3. Queries still work but are noticeably slower. Typical duration: 30–90 days.

Frozen — S3 Partial Mount

The index lives entirely on object storage (S3/GCS). When you search it, Elasticsearch streams only the pieces it needs — called a "partial mount." Storage cost drops dramatically (S3 vs local SSD), but each query takes seconds not milliseconds. Worth it for audit logs you search once a year.

Delete — Gone

The index is permanently removed. Set this to whatever your retention policy requires — GDPR typically mandates no longer than necessary, while security compliance often mandates at least 90 days. ILM fires the delete automatically so you never have to remember.

Index Aliases and Rollover

ILM works hand-in-hand with two related features. An index alias is just a pointer with a stable name — your application writes to logs-current, which points to whatever the actual index is today (e.g. logs-000042). When a rollover condition fires (index is 30 GB, or 7 days old, or contains 50 million docs), ILM creates a fresh new index (logs-000043) and silently redirects the alias. Your application never notices the switch.

Why bother with aliases? Because without them, you'd have to update your application's target index name every time a rollover happens — which could be daily. Aliases give you a stable write endpoint forever.

ILM is THE answer to "logs grow forever." Without it, a single index becomes a 5 TB monster after a few months — slow to query, impossible to maintain, and horrifying to delete safely. With ILM configured from day one, Elasticsearch manages its own disk usage automatically.

Section 14

Vector Search & Embeddings

Traditional keyword search matches exact words. Vector search matches meaning — it converts text (or images) into a list of numbers called an embedding, then finds the documents whose embeddings are mathematically closest to your query's embedding. Elasticsearch added native dense vector support in v7.x and HNSW approximate-nearest-neighbor search in v8.x.

Classic text search is brilliant at finding "the exact words you typed." But what happens when a user searches "running shoes" and the matching product is labeled "athletic footwear"? Keyword search fails. The words never overlap. Vector search solves this by working with meaning rather than characters.

Here is the intuition. A machine learning model trained on billions of sentences has learned that "running shoes" and "athletic footwear" mean basically the same thing. When you pass both phrases through that model, you get two lists of numbers (vectors) that are very close to each other in mathematical space. Vector search finds the closest vectors — so it finds the match even though no words were shared.

Four Big Use Cases

Semantic Search

A user types "affordable smartphones with good cameras." Your product catalog has items described as "budget-friendly mobile with premium optics." Keyword search misses it entirely. Vector search finds it because both phrases map to nearby embedding vectors. This is the most common production use case today.

RAG — Retrieval-Augmented Generation

When you chat with an AI assistant powered by your own documents (internal wiki, legal contracts, support tickets), the system first runs a vector search to find the 5–10 most relevant chunks of text, then feeds those chunks to a large language model (LLM) as context. Without the retrieval step, the LLM would have to hallucinate answers. Elasticsearch is often the retrieval engine in this pattern.

Image Search

Vision models (like CLIP) produce embedding vectors for images. Once you index images by their vectors, a user can upload a photo and find visually similar products in your catalog — no text query needed. Same HNSW search, different modality.

Recommendation

If you embed every article, product, or video, then "find items similar to this one" becomes a nearest-neighbor search in vector space. The embedding captures subtle similarity — two news articles about different politicians but the same policy topic will be nearby, even if they share zero keywords.

Working with Dense Vectors in Elasticsearch

First define the mapping. The dims must match whatever embedding model you use (384 for MiniLM, 768 for BERT-base, 1536 for OpenAI ada-002).

PUT /products
{
  "mappings": {
    "properties": {
      "title":       { "type": "text" },
      "description": { "type": "text" },
      "embedding": {
        "type":       "dense_vector",
        "dims":       768,
        "index":      true,
        "similarity": "cosine"
      }
    }
  }
}

Setting "index": true tells Elasticsearch to build the HNSW graph on this field — that is what enables fast approximate-nearest-neighbor queries. Without it, every kNN query would do a brute-force scan, which is far too slow at scale.

When indexing, run each document through your embedding model (in your application code or an inference pipeline) and attach the resulting vector alongside regular text fields.

POST /products/_doc/1
{
  "title":       "Athletic Footwear — Trail Running Series",
  "description": "Lightweight shoes for trail and road running",
  "embedding":   [0.23, -0.71, 0.55, 0.12, -0.33, ...]
}
// The embedding array has 768 numbers — one per dimension.
// Your ML model produces this from the title+description text.

In production, you typically generate embeddings in bulk at index time using a batch inference job, or pipe documents through Elasticsearch's built-in inference processor (available in Elastic Cloud) which calls a hosted model automatically.

At query time, embed the user's query string and ask for the 10 nearest vectors. You can also combine this with keyword filters — that is where Elasticsearch's hybrid search shines over dedicated vector databases.

POST /products/_search
{
  "knn": {
    "field":        "embedding",
    "query_vector": [0.21, -0.68, 0.59, 0.14, ...],
    "k":            10,
    "num_candidates": 100
  },
  "query": {
    "term": { "category": "footwear" }
  }
}
// knn block finds the 10 semantically closest products.
// query block filters: only footwear category.
// Both constraints must be met — pure vector DBs can't do this easily.

The num_candidates parameter controls the HNSW search breadth — higher values mean more accurate results but slightly slower queries. A typical value is 5–10x your k.

Elasticsearch vs dedicated vector databases (Pinecone, Weaviate, Qdrant): The dedicated systems are often faster for pure vector search at very large scale (billions of vectors). But Elasticsearch's advantage is that you can combine vector search + keyword search + structured filters (date range, category, price) in a single query. If you already have Elasticsearch for search and logs, adding vector search costs near zero operationally.

Section 15

Performance & Tuning

Elasticsearch performance comes down to five main levers: JVM heap sizing, how fields store sort/aggregation data, the refresh interval, batching writes via the bulk API, and managing segments on cold indices. Getting these right can mean the difference between a cluster that handles 50 K queries per second and one that falls over at 5 K.

Most Elasticsearch performance problems fall into a small set of categories that experienced operators have seen over and over. The good news: they are all preventable if you know the rules before you deploy.

The Five Performance Levers

1 — Heap Size (JVM)

Elasticsearch runs on the JVM. The heap is where all the in-memory work happens — caches, aggregation results, filter contexts. The rule of thumb is 50% of system RAM for heap, with a hard cap around 31 GB. Why 31 GB? The JVM uses "compressed ordinary object pointers" below roughly 32 GB — a trick that lets it address more objects with 32-bit pointers instead of 64-bit ones. Cross that boundary and memory efficiency degrades noticeably. Leave the other 50% of RAM for the OS file cache, which Lucene uses heavily for segment reads.

2 — doc_values vs fielddata

When you sort by a field or run an aggregation on it, Elasticsearch needs a way to look up "what value does document #4321 have for this field?" There are two mechanisms. doc_values is an on-disk column-oriented structure built at index time — fast, efficient, and the default for numeric/keyword fields. fielddata loads the entire field's values into heap memory — fast after the first load, but it eats heap and can cause GC pressure. Never enable fielddata on high-cardinality text fields in production unless you have no other option.

3 — Refresh Interval

Every second by default, Elasticsearch "refreshes" the current in-memory write buffer into a searchable Lucene segment. This gives near-real-time search. But each refresh creates a new tiny segment, and tiny segments eventually need to be merged — a CPU and I/O intensive process. If you are ingesting millions of logs per second and don't need sub-second freshness, raise the refresh interval to 30 seconds. This batches writes into larger segments and dramatically reduces merge overhead.

4 — Bulk Indexing API

Single-document indexing sends one HTTP request per document. At 100 K docs/sec, that is 100 K HTTP requests per second — the overhead alone would saturate the cluster. The _bulk API lets you send thousands of documents in a single request. A reasonable batch size is around 5–10 MB per request (roughly 1 K–10 K documents depending on document size). Tune batch size empirically — too small wastes HTTP overhead, too large increases memory pressure and latency spikes.

5 — Force Merge After Rollover

A Lucene index is composed of multiple immutable segments. Over time, segments accumulate from refreshes. Each query must scan all segments. When an index rolls over to warm (read-only), the optimal action is to force-merge it to a single segment. One segment means fastest possible query time, smallest possible disk footprint, and minimal merge overhead forever after. ILM can do this automatically as part of the warm phase action.

Rough performance ballpark: A well-tuned Elasticsearch cluster running on modern hardware can typically handle somewhere in the range of 10 K–100 K search queries per second and index 100 K–1 M documents per second, depending heavily on document size, query complexity, and hardware. These are indicative figures — your real-world numbers will vary significantly based on your specific workload.

Section 16

Operations & Maintenance

Running Elasticsearch in production requires six ongoing disciplines: regular backups via snapshots, active monitoring of cluster health, careful rolling upgrades, security hardening, index templates for consistent settings, and understanding the three cluster health states (green / yellow / red) — especially the dangerous habit of ignoring yellow.

Elasticsearch is not a "set it and forget it" system. It rewards teams that invest in operational discipline and punishes those that don't. The good news: most of the tooling you need is built in — you just have to know it exists and use it.

Snapshots — Your Backup Safety Net

Elasticsearch has a built-in snapshot API that takes incremental backups of indices to an external repository — usually S3, GCS, or Azure Blob Storage. A snapshot captures shard state at a point in time. Restoring is as simple as calling the restore API.

Schedule snapshots at minimum daily. For critical indices, hourly. The _snapshot API makes it easy to automate via ILM policies or a simple cron job calling the REST endpoint. Without snapshots, a disk failure on all replicas simultaneously means permanent data loss.

Monitoring — What to Watch

Elasticsearch ships with Stack Monitoring (visible in Kibana) that tracks all the key metrics out of the box. If you prefer Prometheus/Grafana, the elasticsearch_exporter is the standard community tool. The six metrics that matter most in production:

Search latency (p50, p95, p99)
Indexing rate (docs/sec)
JVM heap used (%)
GC pause duration
Thread pool queue sizes
Unassigned shards (any > 0 needs attention)

Upgrades — Rolling Not Restarting

Elasticsearch supports rolling upgrades for minor versions — you upgrade one node at a time while the cluster keeps serving traffic. The cluster demotes the node, waits for its shards to be reallocated to other nodes, upgrades the node, and brings it back. Major version upgrades (e.g. 7.x → 8.x) sometimes require stopping the entire cluster, though Elastic provides migration tools and cross-cluster replication paths to minimize downtime. Always read the upgrade notes — breaking changes in mappings or APIs have bitten many teams.

Security — Built-in Since 6.8

For years, Elasticsearch defaulted to no authentication — a disaster waiting to happen, and one that did happen when thousands of clusters were exposed to the internet. Since version 6.8 (May 2019), the core Elastic Stack Security features (TLS, native realm users, RBAC) ship free in the Basic tier; since version 8.0 (Feb 2022), security is enabled by default in new installations. It provides TLS encryption for all node-to-node and client-to-node traffic, role-based access control (RBAC) with fine-grained index and field permissions, and audit logging of all access. There is no excuse for running an unsecured cluster anymore.

Index Templates — Consistency at Scale

When you create an index manually, you define its mappings and settings yourself. But in production, dozens of indices get created automatically (by ILM rollover, by Logstash, by application code). Index templates let you define "if the index name matches this pattern, apply these settings and mappings automatically." This ensures every logs-* index gets the right number of shards, the right ILM policy, and the right field mappings — without anyone having to remember to configure them.

Cluster Health States

Elasticsearch reports one of three cluster health states at any moment:

Green — all primary and replica shards are assigned and active. Fully operational.
Yellow — all primary shards are active but one or more replica shards are unassigned. Data is safe, but you have lost fault tolerance.
Red — one or more primary shards are unassigned. Some data is unavailable or lost.

Yellow status is not "fine — just replicas." Yellow means you are one node failure away from red. Many teams see yellow in their dashboard and shrug because "search still works." But yellow is a warning that your fault tolerance has been compromised. Investigate and resolve every yellow status promptly. Common causes: a node went down, a disk filled, or the cluster simply doesn't have enough nodes to place all replicas.

Section 17

Common Pitfalls

Production Elasticsearch has a well-known set of traps. Every one of these has taken teams days or weeks to diagnose because the symptoms (slow queries, cluster instability, memory pressure) look like hardware problems when they are actually configuration problems. Knowing them in advance saves enormous pain.

The six pitfalls below are not hypothetical — they show up repeatedly in postmortems from companies running Elasticsearch at scale. The pattern is always the same: a cluster starts fine, grows over months, and one day starts degrading in ways that are hard to trace without knowing these failure modes.

The Six Pitfalls in Detail

Pitfall 1 — Mapping Explosion

By default, Elasticsearch uses "dynamic mapping" — if you index a document with a new field, it automatically adds that field to the mapping. This sounds convenient until someone stores user-generated JSON with hundreds of arbitrary keys. Suddenly your mapping has 1 000+ fields, the cluster slows down, and you hit Elasticsearch's index.mapping.total_fields.limit (default: 1 000). Fix: define explicit mappings and set "dynamic": "strict" on indices where field explosion is a risk.

Pitfall 2 — Wrong Shard Count

Shards are fixed at index creation and cannot be changed (you must reindex to change them). Too many shards: the master node has to track every shard's state — at 10 000+ shards, master elections become slow and cluster state updates lag. Too few shards: you can't spread load across nodes, hot spots form. A rough guide is around 10–50 GB per shard and roughly 20 shards per node as a starting point — but the right number depends on document size and query patterns.

Pitfall 3 — Heap > 32 GB

The JVM uses a technique called compressed ordinary object pointers (compressed oops) that lets it address objects with 32-bit references instead of 64-bit ones — halving the pointer overhead. This optimization kicks in only below roughly 32 GB. Set heap to 33 GB and you cross the threshold: object references become 64-bit, effective memory capacity drops, and GC pressure increases. The safe maximum is around 30–31 GB. If you need more capacity than one node's 31 GB heap can serve, add more nodes rather than raising heap.

Pitfall 4 — Default Refresh for Heavy Ingest

The 1-second refresh interval is designed for near-real-time search use cases (think product search where users expect instant results). For log ingestion where a 30-second or even 60-second delay is fine, it is a performance disaster — it creates hundreds of tiny segments per minute that must constantly be merged, burning CPU and I/O. Set "refresh_interval": "30s" on hot log indices. You can always reduce it later if you discover you need fresher data.

Pitfall 5 — Deep Pagination

In SQL, OFFSET 50000 LIMIT 10 is inefficient but possible. In Elasticsearch, "from": 50000, "size": 10 is even worse — every shard must collect and sort the top 50 010 results and send them to the coordinator, which then merges and discards all but 10. At from + size > 10 000, Elasticsearch rejects the query by default. The right approach: use search_after (cursor-based, uses last page's sort values) for real pagination, or the Scroll API for bulk data export.

Pitfall 6 — No ILM on Time-Series

This one is deceptively slow-moving. On day 1, the index is tiny and fast. After 6 months of logs, it is 5 TB. After a year, 10 TB. Queries that took 20 ms now take 2 000 ms. The fix — enable ILM — is trivial to set up before the problem starts and extremely painful to apply retroactively (requires reindexing 10 TB of data). Set up ILM on every time-series index from day one.

Section 18

OpenSearch & the License Drama

In January 2021, Elastic NV switched Elasticsearch and Kibana from Apache 2.0 to a dual SSPL/Elastic License — preventing cloud providers from offering managed Elasticsearch without buying a commercial license. AWS responded by forking the last Apache-licensed version (7.10.2) as OpenSearch and OpenSearch Dashboards. Both products have since diverged and are now genuinely different projects.

To understand why this matters, you need a bit of business context. Elasticsearch became incredibly popular, and Amazon Web Services built a managed offering called "Amazon Elasticsearch Service" that let AWS customers use Elasticsearch without any revenue going to Elastic NV (the company). Elastic felt that was unfair — AWS was profiting enormously from their open-source work while contributing relatively little back. The license change was their response.

The SSPL (Server Side Public License) has a clause that essentially says: if you offer this software as a service, you must open-source your entire service stack. For AWS, that was a non-starter. So AWS forked.

Four Key Differences Today

License

OpenSearch is Apache 2.0 — fully open source, permissive, no restrictions on commercial use or offering as a managed service. Elasticsearch uses the Elastic License 2.0 (source available but with restrictions on managed service offerings) and SSPL. In 2024, Elastic added AGPLv3 as a third option for some components. For most enterprises using Elasticsearch internally, these licenses are not a concern. They matter primarily if you intend to build and sell a managed Elasticsearch-compatible service.

Cloud Support

AWS naturally invested heavily in OpenSearch — it powers Amazon OpenSearch Service. Azure and GCP also offer managed Elasticsearch (because Elastic explicitly licenses them). For teams on AWS who want a fully managed, AWS-native search service with tight IAM integration, Amazon OpenSearch Service is the path of least resistance. For teams who want the original Elasticsearch product with commercial support, Elastic Cloud is the canonical choice and runs on all three major cloud providers.

Vector Search & ML Features

Both projects have added vector search, hybrid search, and ML inference capabilities. The implementations differ in their APIs and feature sets. Elasticsearch's HNSW vector search (8.0+) and Elastic Learned Sparse EncodeR (ELSER) for semantic search are generally considered more mature as of early 2026. OpenSearch has caught up significantly and added its own neural search framework. For new greenfield projects, both are viable — compare specific features against your use case.

Client & API Compatibility

When the fork happened, OpenSearch and Elasticsearch initially shared the same REST APIs. Over time they have diverged. The official Elasticsearch client libraries now detect the server type and refuse connections to OpenSearch in some versions. OpenSearch maintains its own forked client libraries. If you plan to run a system that works with both (e.g., a product that customers can deploy on either), you need to test carefully — there are known breaking differences in a handful of APIs, particularly around security and cross-cluster replication.

As of 2026: OpenSearch is mature, widely deployed, and battle-tested at AWS scale. Elasticsearch remains the larger ecosystem with more third-party integrations and a longer track record for enterprise features like Elastic Security and Observability. The practical advice: if you are on AWS and want a fully managed service with minimal operational overhead, Amazon OpenSearch Service is excellent. If you are running your own cluster or want Elastic Cloud's managed offering with commercial support, Elasticsearch is the natural choice. Both are fine technical choices — license and operational model should drive the decision more than technical capability.

Section 19

Tools & Clients — Your Elasticsearch Toolbox

Elasticsearch has a rich ecosystem of tools built around it. Whether you are a developer writing search queries, an ops engineer monitoring cluster health, or an analyst building dashboards, there is a purpose-built tool waiting for you. The six tools below cover 90% of what you will reach for day-to-day, followed by working code samples for the most common tasks.

Kibana

The official UI for everything Elasticsearch. Open it at http://localhost:5601 and you get the Discover tab for ad-hoc log exploration, Dashboard for building charts and metrics panels, Dev Tools (the built-in REST console where you type queries and see responses), and Stack Monitoring for watching node health, JVM heap, and indexing throughput in real time. If you only install one extra tool alongside Elasticsearch, it is Kibana — it alone replaces half a dozen separate utilities.

Official Client Libraries

Elastic maintains first-party clients for Python (elasticsearch-py), Java (Java REST Client / new Java API Client), JavaScript/TypeScript, Go, Ruby, PHP, and .NET. All of them wrap the same underlying HTTP/REST API but handle connection pooling, automatic node sniffing (discovering new cluster nodes), exponential backoff on retries, and serialisation for you. Using the official client instead of raw requests/axios saves you from re-implementing cluster fault tolerance in every service. The Python and JS clients are the most commonly used in web applications.

elasticdump

A command-line Node.js tool for bulk exporting and importing Elasticsearch data. You point it at an index and it streams documents to a JSON file (or directly to another ES instance). The most common use cases are: migrating an index to a new cluster, making a portable snapshot of a small index for local testing, and seeding a development environment with production-like data. It works over the HTTP API, so it is cluster-agnostic and does not require filesystem access — unlike official snapshot/restore which needs the nodes to reach the same shared storage.

Kibana Dev Tools (Console)

Originally a standalone browser plugin called "Elasticsearch Sense," this REST query playground is now built directly into Kibana under Dev Tools → Console. You type raw HTTP requests in a simplified format (GET /my-index/_search followed by a JSON body) and Kibana handles the auth headers, base URL, and content-type for you. It auto-completes field names from your mapping, which is invaluable when you are learning the query DSL. Think of it as the equivalent of Postman but purpose-built for Elasticsearch — faster to use than curl, and the results are syntax-highlighted JSON.

Curator (legacy) → ILM

Curator was the original Python tool for managing index lifecycle operations — deleting indices older than 30 days, force-merging cold indices, snapshotting before deletion. It worked well for years but required external cron jobs and configuration files. Since Elasticsearch 6.6, Index Lifecycle Management (ILM) is the built-in replacement. ILM runs inside the cluster itself, automatically transitions indices through hot/warm/cold/frozen/delete phases based on age or size, and needs no external scheduler. New deployments should use ILM. You will still encounter Curator in older systems, but do not start new projects with it.

Cerebro / Elasticvue

Open-source cluster administration GUIs — think of them as a friendlier alternative to making raw API calls when you want to see cluster state visually. Cerebro is a Scala/Play web app that shows shard allocation across nodes, lets you change settings, and runs index operations via a point-and-click interface. Elasticvue is a more modern Vue.js-based tool available as a browser extension or standalone web app. Both are useful for operations teams who want a visual overview of which shards are green/red/relocating without typing API calls — especially handy during cluster recovery or rebalancing.

Client Code Examples

from elasticsearch import Elasticsearch

# One client instance per application — it manages the connection pool
es = Elasticsearch("http://localhost:9200")

# Full-text search across the "articles" index
resp = es.search(
    index="articles",
    query={
        "match": {
            "body": "inverted index performance"
        }
    },
    size=10  # return top 10 hits
)

# Each hit has _score (BM25 relevance), _source (original document)
for hit in resp["hits"]["hits"]:
    print(f"[{hit['_score']:.2f}] {hit['_source']['title']}")

from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk

es = Elasticsearch("http://localhost:9200")

# Build 1000 documents — bulk helper expects an iterable of action dicts
def generate_docs(n=1000):
    for i in range(n):
        yield {
            "_index": "products",
            "_id": str(i),
            "_source": {
                "name": f"Product {i}",
                "price": round(10 + i * 0.5, 2),
                "category": "electronics" if i % 2 == 0 else "books",
            },
        }

# bulk() sends documents in batches (default 500 per batch)
# Why bulk? Each individual index call has HTTP overhead ~1 ms.
# 1000 individual calls = ~1 second; 1 bulk call = ~20 ms.
success, errors = bulk(es, generate_docs())
print(f"Indexed {success} docs, {len(errors)} errors")

// GET /products/_search
// Aggregation: average price per category
// "size: 0" means — give me no documents, only the aggregation result
{
  "size": 0,
  "aggs": {
    "by_category": {
      "terms": {
        "field": "category.keyword",
        "size": 10
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        },
        "max_price": {
          "max": { "field": "price" }
        }
      }
    }
  }
}
// Response shape:
// aggregations.by_category.buckets[].key         → "electronics"
// aggregations.by_category.buckets[].avg_price.value → 254.5
// aggregations.by_category.buckets[].doc_count   → 500

Kibana (UI + Dev Tools), official language clients (Python/Java/JS/Go), elasticdump (portable bulk export), Dev Tools Console (query playground), ILM over Curator (built-in lifecycle management), and Cerebro/Elasticvue (visual cluster admin) cover every stage from development to production operations.

Section 20

Common Misconceptions

Elasticsearch is one of the most misused tools in modern infrastructure — not because it is poorly designed, but because it is easy to get started with and hard to know when you are using it wrong. These six misconceptions are the root cause of most Elasticsearch outages and poor architectures. Clear them up now and you will avoid months of painful debugging later.

1. "Elasticsearch is a database — I can use it as my primary store."

This is the most dangerous misconception. Elasticsearch is a search engine built on top of Apache Lucene. It is optimised for full-text search and aggregations, not for being the authoritative record of your data. Why does this matter? Several reasons. First, Elasticsearch does not guarantee that a write that received a 200 OK is permanently persisted — before a shard is flushed to disk, a node crash can lose data (the translog helps but is not a full WAL). Second, there are no foreign key constraints or referential integrity checks. Third, complex updates (read-modify-write) are awkward and not atomic. The right architecture: keep your data in a durable primary store (PostgreSQL, S3, DynamoDB) and treat Elasticsearch as a secondary search index that you rebuild or resync if it ever diverges. You write to Postgres first; Elasticsearch is derived.

2. "Just use Elasticsearch for everything — it handles any query!"

Elasticsearch is a bad fit for relational data, multi-entity transactions, and low-latency point lookups by primary key. If you need to JOIN two datasets (e.g. "find orders with their matching customer details"), you should denormalise at index time — there is no runtime JOIN. If you need to atomically update two documents together, there is no multi-document transaction. And if you need to fetch a known document by its ID as fast as possible (single-digit milliseconds, thousands of times per second), a key-value store like Redis or DynamoDB will outperform Elasticsearch because ES has to parse a Lucene query even for a simple ID lookup. Use Elasticsearch for what it does best: ranked full-text search, log analytics, and complex aggregations over large datasets.

3. "Mapping is optional — Elasticsearch will figure it out."

Dynamic mapping is a convenience feature for prototyping — Elasticsearch will inspect the first document and guess field types. But in production this causes a serious problem called mapping explosion: if your JSON documents have variable or user-controlled keys (common in logs or metrics), Elasticsearch creates a new field in the mapping for every unique key it sees. Eventually you hit the default index.mapping.total_fields.limit (typically 1000), and the cluster starts rejecting new documents. Beyond the field limit problem, dynamic mapping can also guess types wrong — a field that looks like a string in early documents might contain numbers later, causing type conflicts. Always define explicit mappings before going to production, and set "dynamic": "strict" so unexpected fields are rejected immediately rather than silently ingested with a wrong type.

4. "More shards = better performance."

This is backwards. Every shard is a separate Lucene instance with its own file handles, memory overhead, and JVM objects. The master node tracks every shard's state in the cluster state — doubling your shards doubles the master's workload. During a node restart, every shard needs to be recovered and replicated — more shards means slower cluster recovery. And every query fans out to all shards in the target indices; the coordinating node then merges all the partial results. Ten shards means ten partial result sets to merge for every search, which adds latency. The right shard size is roughly 10–50 GB per shard. An index storing 100 GB of data needs 2–10 shards, not hundreds. Start with fewer shards — you can always split later using the Split API.

5. "Replica shards are just backups — they don't do anything during normal operation."

Replicas absolutely serve work during normal operation — they handle read (search) requests. When a search query comes in, the coordinating node round-robins requests across the primary shard and all its replicas. So adding a replica roughly doubles your read throughput for that index. The backup aspect is secondary: replicas also protect against data loss if a node fails (the replica gets promoted to primary). Engineers who think replicas are dormant backups often under-provision them and then wonder why search latency spikes when traffic doubles — they only had one shard serving all reads when they could have had two or three.

6. "Elasticsearch is real-time — documents are searchable the moment they are indexed."

Not quite. The default index.refresh_interval is 1 second. A document is written to a Lucene in-memory buffer first. Every second, a "refresh" makes that buffer visible to searches by creating a new Lucene segment. So there is a roughly 0–1 second gap between indexing and searchability — the document exists in the translog (durable) but is not yet in any segment (not yet searchable). You can force an immediate refresh with ?refresh=true on an index call, but this has a real cost: frequent refreshes create many tiny segments, which degrades search performance until a background merge catches up. For truly time-sensitive use cases, you can reduce refresh_interval to 100ms, but expect higher indexing overhead. "Near real-time" is the accurate description, not "real-time."

Six truths to internalize: ES is a secondary search index, not a primary database; it is bad for joins/transactions/point lookups; always use explicit mappings in production; fewer larger shards outperform many small shards; replicas serve read traffic actively; and indexing has a ~1 second refresh latency before a document is searchable.

Section 21

Real-World Disasters & Lessons

Every disaster below happened in a real production system. The patterns are embarrassingly common — most Elasticsearch outages trace back to one of five root causes. Understanding these stories in advance costs you nothing. Learning them the hard way can cost you a job or a company.

Disaster 1 — Mapping Explosion from Dynamic Mapping

A startup ingested JSON application logs using dynamic mapping. The logs contained user-provided metadata keys — things like session identifiers, custom event attributes, and A/B test variant names. Every unique key became a new field. Within 8 weeks, the index had over 1000 fields. Elasticsearch started refusing new documents with 400 Limit of total fields exceeded. The team could not easily fix it because re-indexing required downtime — and the primary store was Elasticsearch itself (first mistake).

Lesson: Set "dynamic": "strict" in production mappings so unexpected fields are rejected immediately, not silently stored. For truly variable JSON, use a single flattened field type — it stores arbitrary sub-keys without creating individual mappings. Define your schema before you ingest, not after the explosion.

Disaster 2 — Open Clusters Ransomwared (2017+)

Before Elasticsearch 6.8, security features (authentication, TLS, role-based access control) were part of the paid X-Pack commercial add-on. Many teams ran Elasticsearch bound to 0.0.0.0 — listening on all network interfaces — without realizing security was off by default. Attackers used Shodan to find thousands of open Elasticsearch clusters, exfiltrated the data, deleted the indices, and left ransom notes. Hundreds of companies lost production data. The attack required zero exploitation — the cluster was simply open to the internet.

Lesson: Since Elasticsearch 8.0, security is enabled by default and the cluster will not start without it. Since 6.8, the basic security features are free. If you are running anything older, enable security immediately or at minimum bind to localhost and use network-level controls. Never expose Elasticsearch directly to the internet.

Disaster 3 — Heap > 32 GB and Compressed Oops Regression

An engineering team saw their Elasticsearch nodes struggling with heap GC pauses and decided to give each node 64 GB of heap. Performance got worse — search latency increased 30% and GC pauses became more severe, not less. The explanation is a JVM internals quirk: the JVM uses compressed ordinary object pointers (compressed oops) to represent 64-bit heap addresses in 32 bits when heap is below ~32 GB. Above ~32 GB, the JVM must use full 64-bit pointers, which doubles the memory bandwidth needed to traverse object graphs and increases GC scanning time significantly. The team had unknowingly disabled an optimization that was saving them enormous memory.

Lesson: Keep Elasticsearch heap at 31 GB or below (specifically, test with your JVM's actual compressed-oops threshold via -Xmx31g). If you need more memory, run two nodes on the same machine rather than giving one node more than 32 GB. Elasticsearch's documentation explicitly warns about this.

Disaster 4 — Cluster Split-Brain Before 7.0

In Elasticsearch versions prior to 7.0, cluster consensus used Zen Discovery. The critical setting was discovery.zen.minimum_master_nodes — it had to be set to (N/2 + 1) where N is the number of master-eligible nodes. Many teams either left it at its default of 1 or calculated it wrong. On a 3-node cluster with minimum_master_nodes=1, a network partition could cause two independent masters to form, each believing the other's nodes were dead. Both masters would accept writes, creating two divergent data sets — a split-brain. When the network healed, one side's writes would be silently discarded.

Lesson: Elasticsearch 7.0 replaced Zen Discovery with a Raft-like consensus protocol using cluster.initial_master_nodes. Split-brain is prevented by design in 7.0+. If you are on a version before 7.0, set minimum_master_nodes correctly. If you are on 7.0+, this problem is solved for you — one of the best reasons to upgrade.

Disaster 5 — Snapshots to Same Region as Cluster

A team diligently configured automated snapshots to an S3 bucket — but the bucket was in the same AWS region as their Elasticsearch cluster. A regional AWS outage took down both the cluster and the snapshot storage simultaneously. They had backups, but could not access them. Recovery required waiting for the region to come back online, or a manual, error-prone process of copying snapshots to another region first.

Lesson: Always configure cross-region snapshot replication. The S3 bucket for your snapshots should be in a different AWS region than your Elasticsearch cluster. This is an extra step that feels unnecessary until the moment you need it. Use S3 Cross-Region Replication (CRR) or configure a second snapshot repository in a different region and snapshot to both.

Five production disasters that keep repeating: mapping explosion from dynamic mapping (use strict + flattened), open clusters ransomwared (security is free since 6.8, on by default since 8.0), heap above 32 GB breaking compressed oops (keep at 31 GB), split-brain before 7.0 (solved in 7.0+ Raft protocol), and same-region snapshots (always cross-region).

Section 22

Performance & Best Practices Recap

This section distils eight rules that cover the vast majority of Elasticsearch performance issues. None of them require deep Lucene internals knowledge — they are practical decisions that any engineer running ES in production should have already made. If your cluster is struggling, check each rule below first before looking elsewhere.

Explicit mappings always

Before ingesting a single document, define your mapping. Specify which fields are text (analyzed for full-text search), which are keyword (for exact-match filtering and aggregations), which are date, integer, or float. Set "dynamic": "strict" so any field not in your mapping causes an immediate, visible error rather than silently ballooning your field count. This is a five-minute task that prevents weeks of re-indexing pain.

Right shard size: 10–50 GB

Why 10–50 GB? Smaller shards have disproportionate metadata overhead — the master node tracks every shard's routing, state, and allocation. Larger shards mean slower recovery after a node failure (the whole shard must be replicated across the network before the cluster goes green). 10–50 GB is the sweet spot where Lucene merge overhead, network recovery time, and master state overhead are all reasonable. For a 500 GB index, aim for 10–20 shards, not 500.

Heap ≤ 31 GB

Set the JVM heap with -Xms31g -Xmx31g (setting min and max equal prevents resize pauses). Above roughly 32 GB, the JVM cannot use compressed object pointers — every heap reference goes from 4 bytes to 8 bytes, doubling GC scan time and cache pressure. If a single node needs more RAM than 31 GB of heap, allocate the remainder to the OS file cache (which speeds up Lucene segment reads) or run a second node on the same host.

ILM for time-series

Index Lifecycle Management is the built-in scheduler that automatically moves indices through hot (fast SSD, actively written), warm (searchable but no new writes), cold (infrequent access, cheaper storage), and delete phases based on age or index size. Without ILM, you need external cron jobs, custom scripts, and manual discipline to manage retention. With ILM, you define the policy once and the cluster enforces it forever. For any log, metric, or time-series workload this is non-negotiable.

Filters over queries when you can

Elasticsearch has two contexts for clauses: query context (calculates a relevance score — used for full-text search) and filter context (yes/no — does this doc match or not). Filter context results are cached in a bitset at the shard level, so the second time you ask "is status = 'published'?", it is a bitset AND, not a re-scan. Use bool.filter for date ranges, status fields, category tags, and any condition without a relevance requirement. Reserve bool.must / bool.should for actual free-text search where scoring matters.

Bulk indexing + force merge

Use the _bulk API for all high-throughput indexing — send 5–10 MB batches, tune the number of concurrent bulk threads to match your CPU count, and avoid ?refresh=true per document (let the 1-second refresh cycle handle it). For cold indices that are no longer written to, use POST /index/_forcemerge?max_num_segments=1 to collapse all Lucene segments into one. This produces the smallest possible disk footprint and the fastest possible read performance, because Lucene never needs to merge results from multiple segment files.

Eight rules: explicit mappings + dynamic strict, 10–50 GB shard size, heap at 31 GB max, ILM for all time-series, filter context over query context for non-scored conditions, bulk indexing in 5–10 MB batches, force merge cold indices to 1 segment, and cross-region snapshots as the backup baseline.

Section 23

Frequently Asked Questions

These are the questions that come up in interviews, architecture reviews, and Slack channels whenever Elasticsearch is on the table. Each answer is written for someone who understands databases generally but is still learning where Elasticsearch fits in the ecosystem.

Q1: Elasticsearch or OpenSearch — which should I use?

This is a licensing question as much as a technical one. In 2021, Elastic changed the Elasticsearch license from Apache 2.0 to a dual SSPL/Elastic License. AWS forked the last Apache 2.0 version and created OpenSearch, which remains Apache 2.0 — meaning you can embed it in commercial products, host it as a managed service, and modify it without restrictions. For most self-hosted use cases, the technical differences are small today (OpenSearch has diverged in security features and some ML capabilities, but core search functionality is equivalent). The decision comes down to: do you need the commercial Elastic features (ELSER neural search, ML inference), or do you need Apache 2.0 freedom? AWS environments default to OpenSearch; teams wanting the full Elastic product stack use Elasticsearch. Either will serve 95% of search and observability use cases.

Q2: When does Elasticsearch make sense vs. when is it overkill?

Elasticsearch makes sense for: full-text search (e-commerce, docs, knowledge bases), log analytics and observability (the ELK/EFK stack), time-series metrics with complex aggregations, and any use case that needs faceted search (filter by category + price range + brand simultaneously). It is overkill for: a simple LIKE query on a few thousand rows (use Postgres full-text search with tsvector), a single-field exact-match lookup (use Redis or DynamoDB), or a system that needs ACID transactions (use a relational database). A common rule of thumb: if your search needs would be well-served by PostgreSQL's built-in full-text search, stay in Postgres — every additional system has operational overhead. Bring in Elasticsearch when query complexity or data volume has actually outgrown what Postgres FTS can handle.

Q3: How big can an Elasticsearch cluster realistically get?

Very large. Yelp, Walmart, and eBay run clusters with hundreds of nodes and petabytes of indexed data. Elasticsearch scales horizontally by adding nodes — data shards distribute across the cluster, and read throughput scales linearly with replica count. The practical limit is usually organizational (cost, ops complexity) rather than technical. Architectural note: at very large scale (500+ nodes), dedicated master nodes (nodes that only maintain cluster state, never hold data), dedicated coordinating nodes (query routing only), and data-node tiers (hot, warm, cold, frozen) become important to keep cluster management overhead manageable. You would likely use Elastic Cloud or a managed service at that scale rather than self-hosting.

Q4: How much extra storage does Elasticsearch use compared to raw data size?

Plan for roughly 2–5× your raw data size for a typical configuration. The overhead comes from three sources: (1) inverted indexes, which store every token's posting list; (2) doc values, which store field values in a column-oriented format for sorting and aggregations; and (3) replicas, where each replica is a full copy of every primary shard. A 100 GB raw dataset with 1 replica and typical analyzers might use 250–400 GB on disk. Reducing stored fields, disabling doc values on fields you never sort/aggregate, and using best_compression codec can cut this significantly. For cold archive data, frozen tier + searchable snapshots can reduce live storage to near zero (snapshots stay on S3; data is fetched on demand).

Q5: How do I paginate deep results without running out of memory?

The default from/size pagination (skip N, take M) becomes expensive and risky past page ~100 because the coordinating node must fetch and merge from + size results from every shard. At page 1000 with size 10, that is 10 010 documents from each shard — then discard 10 000. Elasticsearch caps from at 10 000 to prevent this. For deep pagination, use search_after: you pass the sort values of the last document you saw, and ES efficiently finds the next page starting from that exact point — no skipping, no discard. Use a tiebreaker field (like _id) in your sort to guarantee stability. For exporting all documents (not interactive pagination), use the scroll API or Point In Time (PIT) + search_after for a stateless consistent snapshot.

Q6: Can I do JOINs in Elasticsearch?

Not in the relational sense. Elasticsearch is designed to work with denormalized data. There are two limited join mechanisms: nested field type (embed related objects as nested documents within a parent document — they are stored together and can be queried together without cross-document joins) and join field type (parent-child relationship where parent and child documents live on the same shard and can be queried together). Both have caveats: nested documents inflate document size; parent-child has query overhead and join field complexity. The pragmatic answer is to denormalize at index time. If an order needs customer data, embed the customer's name and email into the order document when indexing, so the search result is self-contained. True normalized relational joins belong in your primary store, not in ES.

Q7: How fresh is data after indexing — true real-time or delayed?

The default refresh interval is 1 second — a document indexed now will be searchable within ~1 second, not instantly. This is called "near real-time" (NRT) in Elasticsearch's documentation. The document is written to the translog immediately (durable against node crashes) but is only promoted to a searchable Lucene segment after a refresh. You can reduce the interval to 100ms for more freshness, but each refresh creates a new small segment — too-frequent refreshes create segment proliferation until background merges clean up, which temporarily degrades search performance. For most applications 1-second lag is fine. If you genuinely need sub-second search freshness, consider whether Elasticsearch is the right tool or whether the data should live in Redis / a true real-time store.

Q8: Can I combine keyword search and vector (semantic) search in one query?

Yes — since Elasticsearch 8.x, hybrid search combining BM25 keyword scoring and k-nearest-neighbor (kNN) vector similarity is a first-class feature. You store a dense vector embedding alongside your document fields (using a dense_vector field), then issue a single query that blends BM25 relevance and vector similarity using Reciprocal Rank Fusion (RRF) or a custom linear combination. This gives you "semantic + exact keyword" results — you find documents that are both linguistically similar to the query (via the embedding) and contain the specific terms the user typed (via BM25). This pattern is increasingly important for AI-powered search and Retrieval-Augmented Generation (RAG) applications, where vector similarity alone misses keyword-critical queries and keyword search alone misses paraphrase-style queries.

Elasticsearch vs. OpenSearch is primarily a licensing choice; both cover 95% of use cases. ES makes sense for full-text search, log analytics, and complex aggregations — overkill for simple LIKE queries. Storage overhead is typically 2–5× raw size. Use search_after for deep pagination. Denormalize for joins. 1-second NRT refresh is the baseline. Hybrid BM25 + kNN search is built in since 8.x.

Elasticsearch — Distributed Search at Scale

TL;DR — Elasticsearch in Plain English

Why You Need This — The Problem SQL Can't Solve

The SQL attempt

The Elasticsearch version

Mental Model — The Inverted Index

Forward index vs. inverted index

Why this matters so much

Four design heuristics that follow from this

Core Concepts — The Six Building Blocks

Document — The Unit of Everything

Index — The Logical Container

Shard — The Physical Storage Unit

Replica — The Safety Net

Mapping — The Schema Definition

Analyzer — The Text Processing Pipeline

Lucene Under the Hood — Segments, Refresh & Merge

The Shard → Segments relationship

Three key behaviors that follow from this design

Refresh Interval — Why "Near Real-Time" Not "Real-Time"

Segments Are Immutable — Why Updates Are "Delete + Re-Index"

Merge Policy — Keeping Segment Count Manageable

Mapping & Analyzers — Teaching Elasticsearch Your Language

What a mapping tells Elasticsearch

The four most common analyzer types

standard

keyword

ngram / edge_ngram

language-specific

Mapping and analyzer code examples

Cluster Architecture

The 5 Node Roles

Master-eligible

Data

Coordinating-only

Ingest

Machine Learning

Sharding & Replication

4 Design Rules for Sharding

Shard count is fixed at index creation

Replica count is flexible

Routing keeps related documents together

Shard size sweet spot: 10–50 GB

The Query DSL

5 Core Query Types

match

term

range

bool

multi_match

Query Examples

Aggregations

4 Aggregation Types

Bucket aggregations — Group documents

Metric aggregations — Calculate statistics

Pipeline aggregations — Operate on aggregation results

Nested aggregations — Aggs inside aggs

Aggregation Examples

Relevance & Scoring

4 Score-Tuning Patterns

Field boosting

Fuzzy matching

Function score

Synonym expansion

Logs & Observability (ELK Stack)

The 4 Stack Components

Filebeat / Metricbeat (Beats)

Logstash

Elasticsearch

Kibana

Modern Alternatives

OpenSearch (Apache 2.0)

Grafana Loki

Index Lifecycle & Retention (ILM)

The Five ILM Phases

Hot — Active Writes

Warm — Read-Only

Cold — Rarely Searched

Frozen — S3 Partial Mount

Delete — Gone