Schema design and normalization
Organise tables so each fact lives in exactly one place.
- Explain why duplicated data causes anomalies
- Apply normalization to split data into focused tables
- Recognise when to denormalize deliberately
How you structure tables matters as much as how you query them. Good schema design is mostly one idea: each fact should live in exactly one place.
The problem with duplication
Imagine cramming everything into one wide table — every order row repeating the customer's name and country. That invites anomalies:
- Update anomaly: a customer changes country; you must fix every one of their order rows, and miss one → inconsistent data.
- Insertion anomaly: you can't record a customer until they place an order.
- Deletion anomaly: deleting their last order erases the customer entirely.
The shop schema avoids all three by splitting facts: customer details live once in
customers, and orders just references a customer by id (the primary key).
Normalization
Normalization is the process of splitting data to remove that redundancy. Practical rules of thumb (the first three "normal forms" in plain terms):
- One value per cell — no comma-separated lists in a column.
- Every non-key column depends on the whole primary key.
- Columns depend on the key, not on other non-key columns (e.g. don't store a customer's country in the orders table — it belongs with the customer).
Follow these and each fact has a single home; you join to bring facts together.
When to denormalize
Normalization optimises for correctness; sometimes you trade a little back for read performance — deliberately duplicating data to avoid expensive joins on a hot path (the caching trade-off from the fundamentals track). Do it consciously, and only when measurement says you need it — normalized-by-default is the safe starting point.
Where to go next
A normalized schema relies on keys and rules to stay consistent. Next: constraints and keys that the database enforces for you.