The bias-variance tradeoff
Why every model sits on a spectrum between too simple and too complex — and how to read the signs.
- Define bias as systematic error from a model that is too simple
- Define variance as sensitivity to noise from a model that is too complex
- Describe the tradeoff between the two
- Identify high-bias and high-variance symptoms from learning curves
Every supervised model makes errors. Understanding why it makes them — and which category of error you are facing — determines the correct remedy. The bias-variance framework provides that diagnosis.
Bias: the error of oversimplification
Bias is the error introduced by approximating a complex problem with a model that is too simple. A linear model fit to non-linear data will always miss the curve, no matter how much data you add. The model has a systematic, structural blind spot. We say it underfits.
Symptoms of high bias:
- Training error is high.
- Test error is similarly high.
- Adding more training data barely helps — the model cannot use it, because the architectural limitation is the bottleneck, not data quantity.
The fix for high bias is a more expressive model: more features, a deeper tree, a polynomial expansion, or a more flexible algorithm.
Variance: the error of overspecialisation
Variance is the error introduced by a model that is sensitive to the specific noise in its training data. A deep decision tree can memorise every training example exactly — but the noise it memorised does not generalise. On new data, the predictions are wild. We say it overfits.
Symptoms of high variance:
- Training error is very low (often near zero).
- Test error is substantially higher.
- The gap between training and test error is large.
- Adding more training data helps — noise averages out as the dataset grows.
The fix for high variance is regularisation: constrain the model (limit tree depth, add L2 penalty, use dropout) or gather more data.
The tradeoff
The two errors pull in opposite directions. Make the model simpler: bias rises, variance falls. Make it more complex: variance rises, bias falls. There is an optimal point of minimum total error, and it lies somewhere between the two extremes.
Total error = Bias² + Variance + Irreducible noiseIrreducible noise is the floor — the randomness in the problem itself that no model can remove.
Reading learning curves
A learning curve plots training error and validation error as a function of training set size. It is the most direct diagnostic:
High bias (underfit): both training error and validation error are high and converge to a high value. More data does not help.
High variance (overfit): training error is low; validation error is much higher. The gap persists even as more data is added — though it does narrow gradually as the dataset grows.
Good fit: training error is low; validation error tracks it closely. Any gap is small.
A common mistake is to add complexity to fix a large train/test gap. That makes variance worse. Diagnose first: is training error also high (bias) or is training error fine but test error is high (variance)? The remedy is opposite in each case.
Where to go next
With the tradeoff understood, the next lesson covers the model lifecycle — the full sequence of stages from raw data to a deployed, monitored model, and why each stage gates the next.