Логотип Workflow

Article

Updated at:

Indexes

Stage 6. Indexes: Why One Query Is Fast and Another Is Slow

An index is a separate data structure that helps the database find rows quickly without scanning the whole table. Without an index, the database often performs a full table scan: it reads many rows and checks the condition row by row. On a table with 20 rows this is fine. On a table with millions of orders, users, or log events, it can turn one simple request into a slow operation.

The basic idea is similar to an index in a book. If you need one topic, you do not read every page from the beginning. You open the index, find the term, and jump to the right page. A database index does something similar: it stores indexed column values together with pointers to the table rows where those values live.

Index search concept

SituationWithout indexWith index
Search by emailReads many rowsFast targeted lookup
JOIN by user_idHeavy read volumeFaster join path
Sort by created_atExpensive sortIndex may support order

What an index stores

A table stores the full rows: all selected columns, business data, timestamps, status values, and so on. An index stores only the values needed for fast lookup, plus a reference to the matching row. For example, an index on users(email) contains email values in a structure optimized for search. When the query asks for one email, the database can search the index first, find the row reference, and then read the matching row from the table.

This is why indexes are useful for:

  • frequent filters in WHERE;
  • joins where one table points to another table by key;
  • sorting with ORDER BY;
  • range searches such as dates or numeric intervals;
  • uniqueness rules, for example UNIQUE(email).

An index is usually stored separately from the table data. It may be cached in memory, but it still takes storage and must be maintained. When a row is inserted, deleted, or updated, the database must also update every affected index. This is the main tradeoff: indexes speed up reads, but they add cost to writes and storage.

How the database uses an index

Consider a query that loads recent orders for one user:

Useful example index:

CREATE INDEX idx_orders_user_id_created_at
ON orders (user_id, created_at DESC);

Good for:

  • “latest orders for one user”;
  • joins by user_id;
  • user filter + time sorting.
SELECT id, amount, status, created_at
FROM orders
WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 20;

With no suitable index, the database may need to scan all rows in orders, keep only rows where user_id = 42, sort them by created_at, and then return 20 rows. With the index (user_id, created_at DESC), the database can jump directly to the part of the index where user_id = 42 and already read the newest rows first. This can remove both the full scan and the expensive sort.

The exact behavior depends on the database engine, table size, data distribution, and query plan. That is why EXPLAIN matters: it shows whether the database really uses the index or chooses another path.

B-tree: the default index to understand first

The most common index type in relational databases is the B-tree index. You do not need to implement B-trees to use SQL, but you should understand the practical model. A B-tree keeps keys ordered in a balanced tree. The database starts near the root, compares the searched value with stored keys, follows the correct branch, and reaches the leaf area where matching row references are stored.

B-tree indexes are strong for equality checks, ranges, and ordering:

WHERE email = '[email protected]'
WHERE created_at >= '2026-05-01'
ORDER BY created_at DESC

They are widely used for strings, numbers, dates, primary keys, foreign keys, and composite indexes. A B-tree stays balanced as data changes, so lookup remains efficient even when the table grows. The cost is maintenance: inserts, deletes, and updates may require the tree structure to be adjusted.

Other index types at a beginner level

Different databases support different index types. For a beginner, the goal is not to memorize every implementation detail, but to know why more than one type exists.

Index typeBest forImportant limitation
B-treeCommon equality, range, join, and sort patternsNot always best for text search or special data
Hash indexExact equality lookup such as id = 42Poor for ranges and ordering
Full-text indexSearch inside large text fieldsNeeds language/tokenization rules
Bitmap indexColumns with few distinct values in analyticsOften weak for high-write transactional tables
GiST / spatial-like indexesGeometric, location, range, and custom searchesMore specialized and more complex

A hash index maps a value through a hash function into a bucket. It can be very fast for exact matches, but it does not preserve order, so it is not useful for ORDER BY created_at or WHERE amount BETWEEN 100 AND 500.

A full-text index breaks text into tokens, often removes common words, may normalize word forms, and lets the database find rows containing relevant terms. It is suitable for article search, product descriptions, comments, and documents. It is not the same as a normal index on a whole TEXT column.

Bitmap indexes represent values with compact bitmaps. They can be effective in analytical workloads, especially for columns with low cardinality, such as status, gender, or is_active. They are usually a poor default choice for heavily updated tables because maintaining them can be expensive.

GiST and similar generalized indexes are used for more complex data and operators, for example geographic coordinates, ranges, and specialized searches. You do not normally start with GiST for a simple users.email or orders.created_at query.

Selectivity and column order

Selectivity means how much a filter narrows the result. A condition like email = '[email protected]' is highly selective because it usually returns one row. A condition like status = 'PAID' may be less selective if half of all orders are paid. Highly selective filters usually benefit from indexes more clearly.

Composite index order matters. Index (status, created_at) is strong for a status filter plus date sorting:

WHERE status = 'PAID'
ORDER BY created_at DESC

But it may not help much for a query that filters only by created_at, because the first indexed column is status. A good composite index starts with columns that match the common query pattern. If most requests are “latest paid orders for one user,” then (user_id, status, created_at DESC) may be more useful than three separate single-column indexes.

Costs and common mistakes

Indexes are not free. More indexes mean slower writes (INSERT/UPDATE/DELETE) and more storage. Every extra index is another structure the database must update when table data changes.

Common mistakes:

  1. Indexing everything “just in case”.
  2. Ignoring column order in composite index.
  3. Not checking plan with EXPLAIN.
  4. Adding an index for a rare query while slowing down frequent writes.
  5. Expecting one index type to solve every search problem.

Practical flow:

  1. Find slow query.
  2. Inspect filters/sorting.
  3. Add targeted index.
  4. Compare plan before/after.
  5. Keep only useful indexes.
Decision questionIf yes
Query is frequent?Consider index
Filter is selective?Index likely helps
Table is large?Index often critical
Write rate is high?Avoid extra indexes

Good indexing is about workload-based choices, not quantity. Start with the queries your application runs most often, check what columns they filter, join, and sort by, then create the smallest useful index for that pattern.

Practice Before the Next Lesson

Re-run the examples from this article on a tiny dataset of 5-10 rows. This step matters because SQL becomes much easier once you execute queries yourself and inspect real tabular output. Start with a plain version of the query, then change one thing at a time and compare behavior or execution plan. If results look wrong, validate each part: filters, selected columns, join conditions, and expected NULL handling.

Quick self-check:

  1. I can explain the lesson in one simple sentence.
  2. I can write a basic SQL example from memory.
  3. I can read query output and explain each column.
  4. I can name one common mistake and how to avoid it.

Also verify write-side impact after adding an index. If insert/update latency increases too much, redesign the index set instead of keeping every experimental index.

Extra Scenario: Index for a Hot Endpoint

Assume your hottest endpoint is “latest paid orders for one user.” If it runs hundreds of times per minute, a targeted index usually pays off quickly.

CREATE INDEX idx_orders_user_status_created
ON orders (user_id, status, created_at DESC);

Before adding it, verify this endpoint is truly high-traffic and that status filter is always present. After deployment, compare plan and latency. If impact is weak, revise column order.