Cache locality in practice

Temporal and spatial locality, the matrix traversal example, and how struct layout choices change real-world performance.

Why cache locality matters in your code

Cache locality is the single most actionable architectural concept for application developers. It comes in two flavours:

Temporal locality — if you access a value, you are likely to access it again soon. Keep hot data in variables (registers/L1) rather than re-loading it from memory. Loop invariants pulled outside a loop exploit temporal locality.

Spatial locality — if you access one address, you are likely to access nearby addresses soon. Processing data in memory order exploits the fact that the entire cache line was already fetched.

The matrix traversal example

The classic demonstration is matrix traversal. A matrix stored in row-major order has rows laid out contiguously in memory. Iterating row-by-row (inner loop over columns) is cache-friendly: you march through memory sequentially. Iterating column-by-column (inner loop over rows) skips num_columns × element_size bytes between accesses — every access is likely a cache miss.

# Row-major traversal — cache friendly (accesses elements in memory order)
total = 0
for row in matrix:
    for val in row:
        total += val

# Column-major traversal — cache hostile (jumps across rows in memory)
total = 0
for col in range(len(matrix[0])):
    for row in range(len(matrix)):
        total += matrix[row][col]

On a large matrix this difference can be 5–10x in wall-clock time — not because the algorithm changes, but because the access pattern changes how many cache misses occur.

Struct-of-arrays vs array-of-structs

Data structure layout also matters. A struct-of-arrays (SoA) layout groups all values of one field together; an array-of-structs (AoS) layout groups all fields of one record together. If a hot loop reads only one field, SoA gives better spatial locality. If a loop reads all fields of each record in turn, AoS is better. Game engines and high-performance numerical code choose SoA or AoS deliberately for exactly this reason.

// Array-of-Structs (AoS) — all fields of each entity together
struct Entity { float x, y, z; int health; float speed; };
Entity entities[1000];

// Struct-of-Arrays (SoA) — each field packed with its peers
float x[1000], y[1000], z[1000];
int health[1000];
float speed[1000];

A physics loop that only updates x, y, z runs faster on SoA because those three arrays are dense in memory. The health and speed fields — not touched by the physics loop — stay out of the cache entirely.

The CPU you are writing for today has out-of-order execution, hardware prefetchers, and speculative execution — mechanisms that mask some of the penalties described here. Do not guess: measure with a profiler before restructuring data layouts. Premature optimisation around cache misses in code that is not actually bottlenecked on memory wastes time and reduces readability.

Where to go next

Understanding how the CPU sees your code makes OS-level concepts click into place — processes, threads, context switching, and virtual memory all interact directly with the register and cache state described here. The Operating Systems track continues the story from where the hardware leaves off.

Knowledge check

1.
Which of the following programming practices improve cache locality?
Iterating over an array in sequential orderStoring frequently accessed data in widely separated memory locationsHoisting loop-invariant values into variables outside the loopChoosing a struct-of-arrays layout when a hot loop reads only one field
2.
A physics loop only updates the x, y, z position fields of 10 000 entities — it never reads health or speed. Which layout gives better cache performance?
Array-of-Structs (AoS) — all fields of each entity togetherStruct-of-Arrays (SoA) — each field packed with its peersIt makes no difference — the CPU handles it automaticallyAoS is always faster because fewer pointer dereferences are needed

Finished reading? Mark it complete to track your progress.

Why cache locality matters in your code

The matrix traversal example

Struct-of-arrays vs array-of-structs

Where to go next

Knowledge check

On this page