Pipelining and branch prediction
How CPUs overlap instruction stages to do more per clock, and the cost when a branch guess is wrong.
- Explain what pipelining is and why branch mispredictions are costly
Pipelining
To get more work done per clock, every modern CPU uses a pipeline: it overlaps the fetch, decode, and execute stages of multiple instructions. While instruction N is executing, instruction N+1 is being decoded and instruction N+2 is being fetched. A typical desktop pipeline has 14–20 stages, meaning up to 20 instructions are in flight simultaneously.
The speedup is dramatic: a 20-stage pipeline can theoretically execute 20 times more instructions per second than a single-stage design running at the same clock frequency. The catch is that the pipeline assumes a steady stream of sequential instructions — branches break that assumption.
Branch prediction
The pipeline breaks down at branch instructions (if, while, loop
conditions). When the CPU fetches the instruction after a branch, it doesn't yet
know which way the branch will go — the condition hasn't been evaluated yet. The
solution is branch prediction: the CPU guesses which direction the branch
will take based on historical patterns, and starts executing speculatively along
the predicted path.
If the prediction is correct, no time is lost. If it is wrong, the speculatively executed instructions must be discarded (pipeline flush), and execution restarts from the correct path. A misprediction on a 20-stage pipeline throws away roughly 15 cycles of work. At a billion iterations per second, a 1% misprediction rate costs tens of millions of wasted cycles per second.
// Branch-heavy — predictor struggles with random data
for (int i = 0; i < N; i++) {
if (data[i] > 128) sum += data[i]; // hard to predict if data is random
}
// Branch-free equivalent — always faster on random data
for (int i = 0; i < N; i++) {
sum += (data[i] > 128) ? data[i] : 0; // modern CPUs can often make this branch-free
}Compilers know about this and generate branch-free sequences using conditional
move instructions (cmov on x86) when they can prove it's safe.
Knowledge check
- 1.A branch misprediction forces a "pipeline flush." What does this mean?
- 2.Pipelining is most effective when the instruction stream is sequential with few branches.