How indexes work

B-tree structure, heap vs index scans, index-only scans, and why SELECT * can force unnecessary heap fetches.

SQL looks declarative: you say what you want, not how to get it. Behind the scenes the database engine is doing a remarkable amount of work — choosing access paths, maintaining indexes, writing to disk in a way that survives crashes, and serving dozens of concurrent queries without corrupting each other's data. This lesson opens the hood on the mechanisms that make all of that possible.

How B-tree indexes work

The default index type in virtually every relational database (PostgreSQL, MySQL, SQLite, SQL Server) is the B-tree (balanced tree). A B-tree stores key-value pairs in sorted order across a tree of fixed-size pages (typically 8 KB). Each internal node holds multiple keys and pointers to child pages; the leaves hold the actual keys paired with pointers to the heap rows.

                [30 | 60]
               /    |    \
        [10|20]  [40|50]  [70|80]
           |        |        |
         heap     heap     heap
         rows     rows     rows

Because the tree is balanced, every search path from root to leaf has the same length. For a table with n rows, the tree height is O(log_b n) where b is the branching factor (number of keys per node). A B-tree with 8 KB pages, 8-byte keys, and 8-byte pointers can fit roughly 500 key-pointer pairs per node. A billion-row table needs a tree of height log_500(1,000,000,000) ≈ 3.2 — four pages to access any row. Those four pages can fit in memory; the lookup is extremely fast even without caching.

A range scan (e.g., WHERE created_at BETWEEN '2024-01-01' AND '2024-03-31') also uses the B-tree efficiently. The engine descends to the first matching leaf, then follows the linked list of leaf pages (leaves are doubly-linked in a B+ tree) until the range is exhausted — without re-traversing the tree.

The cost of an index is the maintenance overhead: every INSERT, UPDATE, or DELETE must also update every index on that table. A table with ten indexes pays roughly ten times the write cost of a table with one index. Indexes are not free; add them where queries need them and remove those that are never used.

Heap scans vs index scans

The heap is where actual row data is stored — an unordered collection of pages containing the full rows. An index is a separate data structure that provides sorted access to a subset of columns plus pointers back to the heap.

A heap scan (also called a sequential scan or full table scan) reads every page of the heap in order. It is simple and predictable. When it is faster: if the query needs most of the rows (a low-selectivity condition like WHERE status != 'deleted' on a table where 90% of rows match), reading the heap sequentially is cheaper than bouncing around the heap following index pointers.

An index scan descends the B-tree to find matching key values, then fetches the heap rows by their pointers. Each pointer lookup is a random I/O into the heap. When most of the heap is cached this is fine; when the heap is larger than RAM, each random access may cause a disk seek.

A bitmap index scan (PostgreSQL, others) is a hybrid: traverse the index to build a bitmap of matching heap page numbers, sort the page numbers, then read the heap in order. This converts random I/O to sequential I/O at the cost of some memory for the bitmap.

An index-only scan reads data directly from the index leaf pages, never touching the heap — but only if every column in the query is stored in the index. This is where SELECT * hurts: adding a wildcard column that is not in the index forces the engine to fetch the heap, turning a fast index-only scan into a slower index-then-heap scan.

-- Index on (user_id) covers this query — index-only scan possible
SELECT user_id FROM orders WHERE user_id = 42;

-- SELECT * forces a heap fetch even though user_id is indexed
SELECT * FROM orders WHERE user_id = 42;

-- Covering index makes SELECT * equivalent-ish for read-heavy queries
CREATE INDEX idx_orders_user_covering ON orders (user_id) INCLUDE (status, total);

The query planner makes the decision between heap scan and index scan based on statistics it maintains about the data distribution. ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) updates these statistics. If a planner is choosing a bad plan, stale statistics are the first thing to check.

Knowledge check

1.
A B-tree index on a billion-row table typically requires how many page reads to locate a single row?
Proportional to the number of rows (O(n))Around 3 to 5 page readsA single page readExactly 1000 page reads
2.
A query matches 85% of rows in a large table. The query planner is likely to choose:
An index scan — indexes are always faster than heap scansA heap scan — reading the heap sequentially is cheaper than following index pointers to almost every rowA bitmap index scan — always used for high-selectivity queriesAn index-only scan — SELECT * triggers this path automatically

Finished reading? Mark it complete to track your progress.

How B-tree indexes work

Heap scans vs index scans

Knowledge check

On this page