Code of the Day

How indexes work

B-tree structure, heap vs index scans, index-only scans, and why SELECT * can force unnecessary heap fetches.

Database Internals7 min read
By the end of this lesson you will be able to:
  • Explain how a B-tree index is structured and why it keeps lookups to O(log n)
  • Contrast a heap scan with an index scan and identify when each is preferred
  • Explain why SELECT * hurts index-only scan performance

SQL looks declarative: you say what you want, not how to get it. Behind the scenes the database engine is doing a remarkable amount of work — choosing access paths, maintaining indexes, writing to disk in a way that survives crashes, and serving dozens of concurrent queries without corrupting each other's data. This lesson opens the hood on the mechanisms that make all of that possible.

How B-tree indexes work

The default type in virtually every relational database (PostgreSQL, MySQL, SQLite, SQL Server) is the B-tree (balanced tree). A B-tree stores key-value pairs in sorted order across a tree of fixed-size pages (typically 8 KB). Each internal node holds multiple keys and pointers to child pages; the leaves hold the actual keys paired with pointers to the heap rows.

                [30 | 60]
               /    |    \
        [10|20]  [40|50]  [70|80]
           |        |        |
         heap     heap     heap
         rows     rows     rows

Because the tree is balanced, every search path from root to leaf has the same length. For a table with n rows, the tree height is where b is the branching factor (number of keys per node). A B-tree with 8 KB pages, 8-byte keys, and 8-byte pointers can fit roughly 500 key-pointer pairs per node. A billion-row table needs a tree of height log_500(1,000,000,000) ≈ 3.2 — four pages to access any row. Those four pages can fit in memory; the lookup is extremely fast even without caching.

A range scan (e.g., WHERE created_at BETWEEN '2024-01-01' AND '2024-03-31') also uses the B-tree efficiently. The engine descends to the first matching leaf, then follows the linked list of leaf pages (leaves are doubly-linked in a B+ tree) until the range is exhausted — without re-traversing the tree.

The cost of an index is the maintenance overhead: every INSERT, UPDATE, or DELETE must also update every index on that table. A table with ten indexes pays roughly ten times the write cost of a table with one index. Indexes are not free; add them where queries need them and remove those that are never used.

Heap scans vs index scans

The heap is where actual row data is stored — an unordered collection of pages containing the full rows. An index is a separate data structure that provides sorted access to a subset of columns plus pointers back to the heap.

A heap scan (also called a sequential scan or full table scan) reads every page of the heap in order. It is simple and predictable. When it is faster: if the query needs most of the rows (a low-selectivity condition like WHERE status != 'deleted' on a table where 90% of rows match), reading the heap sequentially is cheaper than bouncing around the heap following index pointers.

An index scan descends the B-tree to find matching key values, then fetches the heap rows by their pointers. Each pointer lookup is a random I/O into the heap. When most of the heap is cached this is fine; when the heap is larger than RAM, each random access may cause a disk seek.

A bitmap index scan (PostgreSQL, others) is a hybrid: traverse the index to build a bitmap of matching heap page numbers, sort the page numbers, then read the heap in order. This converts random I/O to sequential I/O at the cost of some memory for the bitmap.

An index-only scan reads data directly from the index leaf pages, never touching the heap — but only if every column in the query is stored in the index. This is where SELECT * hurts: adding a wildcard column that is not in the index forces the engine to fetch the heap, turning a fast index-only scan into a slower index-then-heap scan.

-- Index on (user_id) covers this query — index-only scan possible
SELECT user_id FROM orders WHERE user_id = 42;

-- SELECT * forces a heap fetch even though user_id is indexed
SELECT * FROM orders WHERE user_id = 42;

-- Covering index makes SELECT * equivalent-ish for read-heavy queries
CREATE INDEX idx_orders_user_covering ON orders (user_id) INCLUDE (status, total);

The makes the decision between heap scan and index scan based on statistics it maintains about the data distribution. ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) updates these statistics. If a planner is choosing a bad plan, stale statistics are the first thing to check.

Knowledge check

  1. 1.
    A B-tree index on a billion-row table typically requires how many page reads to locate a single row?
  2. 2.
    A query matches 85% of rows in a large table. The query planner is likely to choose:
Finished reading? Mark it complete to track your progress.

On this page