Schema design and normalization

How you structure tables matters as much as how you query them. Good schema design is mostly one idea: each fact should live in exactly one place.

The problem with duplication

Imagine cramming everything into one wide table — every order row repeating the customer's name and country. That invites anomalies:

Update anomaly: a customer changes country; you must fix every one of their order rows, and miss one → inconsistent data.
Insertion anomaly: you can't record a customer until they place an order.
Deletion anomaly: deleting their last order erases the customer entirely.

The shop schema avoids all three by splitting facts: customer details live once in customers, and orders just references a customer by id (the primary key).

Normalization

Normalization is the process of splitting data to remove that redundancy. Practical rules of thumb (the first three "normal forms" in plain terms):

One value per cell — no comma-separated lists in a column.
Every non-key column depends on the whole primary key.
Columns depend on the key, not on other non-key columns (e.g. don't store a customer's country in the orders table — it belongs with the customer).

Follow these and each fact has a single home; you join to bring facts together.

Normalization optimises for correctness; sometimes you trade a little back for read performance — deliberately duplicating data to avoid expensive joins on a hot path (the caching trade-off from the fundamentals track). Do it consciously, and only when measurement says you need it — normalized-by-default is the safe starting point.

Where to go next

A normalized schema relies on keys and rules to stay consistent. Next: constraints and keys that the database enforces for you.

Finished reading? Mark it complete to track your progress.

Schema design and normalization

The problem with duplication

Normalization

When to denormalize

Where to go next

On this page