Code of the Day
AdvancedPipeline Design

Lab: build a complete ML pipeline

Ingest, clean, engineer features, split, scale, train, evaluate, and persist — each step as a tested, reproducible function.

Lab · optionalData ScienceAdvanced35 min
By the end of this lesson you will be able to:
  • Implement each pipeline stage as a pure function with an associated test
  • Assemble the stages into an end-to-end pipeline that runs reproducibly
  • Evaluate the final model and produce a summary report
  • Persist the fitted pipeline with joblib

A notebook is not a pipeline. A pipeline is a sequence of composable, tested, deterministic functions that can be run again from scratch and produce the same output. This lab builds one, step by step.

Stage 1 — ingest and inspect

Python — editable, runs in your browser

Stage 2 — clean

Python — editable, runs in your browser

Stage 3 — feature engineering and split

Python — editable, runs in your browser

Stage 4 — build pipeline, tune, and evaluate

Python — editable, runs in your browser

Stage 5 — persist the pipeline

Once you have the fitted pipeline, serialise it for later use. In a real project, this step writes to a versioned artefact store. Here it writes to a temporary file and verifies the predictions are identical after loading:

import joblib
import numpy as np

# Save
joblib.dump(best, "/tmp/churn_pipeline_v1.pkl")

# Load and verify
loaded = joblib.load("/tmp/churn_pipeline_v1.pkl")
assert np.allclose(best.predict(X_test), loaded.predict(X_test))
print("Serialisation verified.")

The complete pipeline — ColumnTransformer, scalers, encoders, and fitted tree — is bundled in a single object. Loading it on a different machine (with compatible sklearn/numpy versions) produces exactly the same predictions, which is the minimal requirement for a reproducible deployment.

Where to go next

You have completed the Advanced Data Science track. The skills built across these five modules — model selection, sklearn pipelines, rigorous evaluation, time series analysis, and pipeline engineering — form the core toolkit for taking analysis from notebook to production.

Finished reading? Mark it complete to track your progress.

On this page