Code of the Day

Pipelining and branch prediction

How CPUs overlap instruction stages to do more per clock, and the cost when a branch guess is wrong.

Computer Architecture5 min read
By the end of this lesson you will be able to:
  • Explain what pipelining is and why branch mispredictions are costly

Pipelining

To get more work done per clock, every modern CPU uses a pipeline: it overlaps the fetch, decode, and execute stages of multiple instructions. While instruction N is executing, instruction N+1 is being decoded and instruction N+2 is being fetched. A typical desktop pipeline has 14–20 stages, meaning up to 20 instructions are in flight simultaneously.

The speedup is dramatic: a 20-stage pipeline can theoretically execute 20 times more instructions per second than a single-stage design running at the same clock frequency. The catch is that the pipeline assumes a steady stream of sequential instructions — branches break that assumption.

Branch prediction

The pipeline breaks down at branch instructions (if, while, loop conditions). When the CPU fetches the instruction after a branch, it doesn't yet know which way the branch will go — the condition hasn't been evaluated yet. The solution is branch prediction: the CPU guesses which direction the branch will take based on historical patterns, and starts executing speculatively along the predicted path.

If the prediction is correct, no time is lost. If it is wrong, the speculatively executed instructions must be discarded (pipeline flush), and execution restarts from the correct path. A misprediction on a 20-stage pipeline throws away roughly 15 cycles of work. At a billion iterations per second, a 1% misprediction rate costs tens of millions of wasted cycles per second.

// Branch-heavy — predictor struggles with random data
for (int i = 0; i < N; i++) {
    if (data[i] > 128) sum += data[i];  // hard to predict if data is random
}

// Branch-free equivalent — always faster on random data
for (int i = 0; i < N; i++) {
    sum += (data[i] > 128) ? data[i] : 0;  // modern CPUs can often make this branch-free
}

know about this and generate branch-free sequences using conditional move instructions (cmov on x86) when they can prove it's safe.

Knowledge check

  1. 1.
    A branch misprediction forces a "pipeline flush." What does this mean?
  2. 2.
    Pipelining is most effective when the instruction stream is sequential with few branches.
Finished reading? Mark it complete to track your progress.

On this page