The model lifecycle

The six stages every ML project moves through — and what goes wrong when you skip one.

A trained model is not a product. It is the output of one stage in a longer process. Understanding the full lifecycle prevents a common pattern: spending weeks on modelling, then discovering that the data preparation stage was flawed, or that there is no way to get predictions to users.

The six stages

1. Data preparation

Output: clean, versioned, split dataset.

Raw data arrives with missing values, wrong types, inconsistent encodings, and label leakage. Preparation resolves all of these. The output is a reproducible dataset with a documented schema — not a notebook that was run once.

This stage gates everything else. A model trained on leaky or incorrectly labelled data produces unreliable metrics at best and confidently wrong predictions at worst.

2. Training

Output: trained model artefact.

Fit the chosen algorithm on the training partition. The training partition only — touching the test set at this stage is data leakage. The output is a serialised model (e.g. a joblib-saved sklearn estimator) plus training-run metadata: algorithm, hyperparameters, training set fingerprint.

3. Evaluation

Output: metric report on held-out data.

Run the trained model on the test partition and compute the metrics relevant to the problem (accuracy, F1, RMSE, etc.). This is the honest estimate of generalisation performance. Any metric computed on training data is a measure of fit, not generalisation.

4. Tuning

Output: tuned model artefact with improved generalisation metrics.

Adjust hyperparameters — max_depth, regularisation strength, learning rate — and re-evaluate. The important constraint: tuning uses cross-validation on the training set. The held-out test set remains untouched until final evaluation. Using the test set to select hyperparameters inflates the final metric.

5. Deployment

Output: model serving some form of inference — batch job, API endpoint, or embedded function.

The model must be packaged so that the same preprocessing steps applied during training are applied at inference time. Sklearn pipelines handle this by bundling the scaler, encoder, and estimator into one object that is serialised and deserialised together.

6. Monitoring

Output: ongoing metric dashboards and alerting.

Data distributions change over time — a phenomenon called data drift. A model that performed well on last year's data may degrade as the world changes. Monitoring tracks prediction distributions and (where labels become available with delay) actual performance. Without monitoring, model degradation is invisible until a business outcome fails.

In practice, the lifecycle is a loop. Monitoring reveals degradation, which triggers a new data preparation cycle with fresher data. The stages are sequential within a cycle, but the cycle itself repeats.

Where to go next

With the lifecycle mapped out, you can start making concrete model choices. Next: choosing a model — a practical decision guide with code to compare three algorithms on the same dataset.

Finished reading? Mark it complete to track your progress.

On this page