Linear regression in sklearn
Fit LinearRegression, inspect coefficients, compute R², and read residuals — the full diagnostic workflow.
- Fit LinearRegression on a dataset and inspect coef_ and intercept_
- Compute R² and interpret what it measures
- Print residuals and explain what a systematic pattern in them means
Linear regression is rarely the final model in a production system, but it is almost always the first one — fast to fit, easy to inspect, and a reliable baseline against which more complex models must justify their extra complexity.
What the numbers tell you
Coefficients (coef_) are the partial slopes: how much the prediction
changes when that feature increases by 1 unit, holding all others fixed. Here
the true slopes are 3.0 and 1.5 — the fitted values should be close. The
distance between true and fitted depends on noise and sample size.
Intercept (intercept_) is the prediction when all features are zero. In
this synthetic example, the true intercept is zero, so the fitted value should
be near zero.
R² (coefficient of determination) measures the fraction of variance in the target that the model explains. An R² of 1.0 is a perfect fit; 0.0 means the model does no better than predicting the mean of y for every sample; negative R² means the model is actively worse than the mean predictor.
Residuals are the errors on individual predictions (y_true - y_pred).
Their mean should be near zero — a systematic non-zero mean is a sign of
miscalibration. Their distribution tells you more: if residuals correlate with
a feature, that feature has a non-linear relationship with the target that a
linear model cannot capture.
R² looks impressive on training data but can be misleading if you never check residuals. A model can achieve high R² while still making large errors on specific subgroups. Always look at the residual distribution and plot predicted vs actual when the stakes matter.
Where to go next
Next: decision trees — a completely different model family that splits the feature space recursively, requires no scaling, and exposes its logic visually.
fit(), predict(), transform()
The three-method contract that makes every sklearn estimator composable — and why that uniformity matters at scale.
Decision trees
How trees split data, why max_depth is the single most important hyperparameter, and how to recognise overfitting before it reaches production.