How indexes work
B-tree structure, heap vs index scans, index-only scans, and why SELECT * can force unnecessary heap fetches.
- Explain how a B-tree index is structured and why it keeps lookups to O(log n)
- Contrast a heap scan with an index scan and identify when each is preferred
- Explain why SELECT * hurts index-only scan performance
SQL looks declarative: you say what you want, not how to get it. Behind the scenes the database engine is doing a remarkable amount of work — choosing access paths, maintaining indexes, writing to disk in a way that survives crashes, and serving dozens of concurrent queries without corrupting each other's data. This lesson opens the hood on the mechanisms that make all of that possible.
How B-tree indexes work
The default index type in virtually every relational database (PostgreSQL, MySQL, SQLite, SQL Server) is the B-tree (balanced tree). A B-tree stores key-value pairs in sorted order across a tree of fixed-size pages (typically 8 KB). Each internal node holds multiple keys and pointers to child pages; the leaves hold the actual keys paired with pointers to the heap rows.
[30 | 60]
/ | \
[10|20] [40|50] [70|80]
| | |
heap heap heap
rows rows rowsBecause the tree is balanced, every search path from root to leaf has the same length. For a table with n rows, the tree height is O(log_b n) where b is the branching factor (number of keys per node). A B-tree with 8 KB pages, 8-byte keys, and 8-byte pointers can fit roughly 500 key-pointer pairs per node. A billion-row table needs a tree of height log_500(1,000,000,000) ≈ 3.2 — four pages to access any row. Those four pages can fit in memory; the lookup is extremely fast even without caching.
A range scan (e.g., WHERE created_at BETWEEN '2024-01-01' AND '2024-03-31')
also uses the B-tree efficiently. The engine descends to the first matching leaf,
then follows the linked list of leaf pages (leaves are doubly-linked in a B+ tree)
until the range is exhausted — without re-traversing the tree.
The cost of an index is the maintenance overhead: every INSERT, UPDATE, or
DELETE must also update every index on that table. A table with ten indexes pays
roughly ten times the write cost of a table with one index. Indexes are not free;
add them where queries need them and remove those that are never used.
Heap scans vs index scans
The heap is where actual row data is stored — an unordered collection of pages containing the full rows. An index is a separate data structure that provides sorted access to a subset of columns plus pointers back to the heap.
A heap scan (also called a sequential scan or full table scan) reads every
page of the heap in order. It is simple and predictable. When it is faster: if
the query needs most of the rows (a low-selectivity condition like
WHERE status != 'deleted' on a table where 90% of rows match), reading the heap
sequentially is cheaper than bouncing around the heap following index pointers.
An index scan descends the B-tree to find matching key values, then fetches the heap rows by their pointers. Each pointer lookup is a random I/O into the heap. When most of the heap is cached this is fine; when the heap is larger than RAM, each random access may cause a disk seek.
A bitmap index scan (PostgreSQL, others) is a hybrid: traverse the index to build a bitmap of matching heap page numbers, sort the page numbers, then read the heap in order. This converts random I/O to sequential I/O at the cost of some memory for the bitmap.
An index-only scan reads data directly from the index leaf pages, never
touching the heap — but only if every column in the query is stored in the index.
This is where SELECT * hurts: adding a wildcard column that is not in the index
forces the engine to fetch the heap, turning a fast index-only scan into a slower
index-then-heap scan.
-- Index on (user_id) covers this query — index-only scan possible
SELECT user_id FROM orders WHERE user_id = 42;
-- SELECT * forces a heap fetch even though user_id is indexed
SELECT * FROM orders WHERE user_id = 42;
-- Covering index makes SELECT * equivalent-ish for read-heavy queries
CREATE INDEX idx_orders_user_covering ON orders (user_id) INCLUDE (status, total);The query planner makes the decision between heap scan and index scan based on
statistics it maintains about the data distribution. ANALYZE (PostgreSQL) or
ANALYZE TABLE (MySQL) updates these statistics. If a planner is choosing a
bad plan, stale statistics are the first thing to check.
Knowledge check
- 1.A B-tree index on a billion-row table typically requires how many page reads to locate a single row?
- 2.A query matches 85% of rows in a large table. The query planner is likely to choose:
Database Internals
What happens inside a database engine — B-tree indexes, heap vs index scans, ACID guarantees, write-ahead logging, and MVCC concurrency.
ACID and write-ahead logging
The four durability guarantees every SQL database makes, and how write-ahead logging implements crash recovery efficiently.