Docker concepts
Images are frozen environments, containers are running instances, and layers make rebuilds fast — understand the model before writing a single Dockerfile.
- Explain the difference between an image and a container
- Describe the layer model and why it makes rebuilds fast
- Articulate why containers solve the reproducibility problem for automation scripts
"It works on my machine" is a statement that automation pipelines cannot afford. A script that ran fine on your laptop may fail in production because of a different Python version, a missing system library, or a subtly different locale setting. Docker solves this by packaging the environment alongside the code.
Images vs containers
A Docker image is a read-only, self-contained snapshot of everything a process needs to run: the operating system base, system libraries, the Python interpreter, your installed packages, and your source code. An image does not run; it is an artifact — like a shipping container before it is loaded onto a vessel.
A container is a running instance of an image. Starting a container from an image is fast (milliseconds) because no new files need to be created — the container layers a thin, writable filesystem on top of the read-only image. Stop the container and that writable layer disappears; the image is untouched.
You can run ten containers from the same image simultaneously, each isolated from the others.
The layer model
Every instruction in a Dockerfile produces a new layer. A layer is a diff against the previous state — only the files changed by that instruction.
FROM python:3.12-slim # Layer 1: base OS + Python interpreter
WORKDIR /app # Layer 2: set working directory (tiny)
COPY requirements.txt . # Layer 3: add requirements file
RUN pip install -r requirements.txt # Layer 4: install packages
COPY . . # Layer 5: add source codeDocker caches layers. When you rebuild, Docker reuses every cached layer up to
the first instruction whose inputs changed. If you only changed pipeline.py,
Docker reuses layers 1–4 (the expensive pip install) and rebuilds only layer 5.
This is why the standard pattern copies requirements.txt before copying the
rest of the source: the package install layer changes rarely, so it stays cached
across most rebuilds.
The cache key for a COPY instruction is a hash of the files being copied.
Changing a single byte in requirements.txt invalidates layer 3 and everything
after it, triggering a full pip install. This is correct — you changed the
dependencies — and fast on the next build if you revert.
Why this solves the reproducibility problem
A Docker image pins every dependency: the Python version, the exact versions of
every package in requirements.txt, the C libraries linked by those packages, and
the OS baseline. The same image tag produces the same behaviour on:
- Your laptop (macOS or Linux).
- A CI runner (Ubuntu 22.04).
- A production server (Debian 12).
- A colleague's machine with a different Python installation.
For automation pipelines, this matters especially because pipelines run unattended.
A Python 3.9 behaviour difference in datetime.fromisoformat caused a production
pipeline to fail silently for months before anyone noticed — a containerised
pipeline pinned to 3.12 would have surfaced the issue the first time someone ran it
with the wrong Python.
Key terms
| Term | Meaning |
|---|---|
| Image | Read-only, built artifact — the frozen environment |
| Container | Running instance of an image |
| Layer | One instruction's diff in the image filesystem |
| Registry | Storage for images (Docker Hub, GHCR, ECR) |
| Tag | Human-readable label for an image version (myapp:1.4.2) |
| Dockerfile | Recipe that builds an image from instructions |
Where to go next
Next: writing a Dockerfile — the minimal Dockerfile for a Python automation
script, with best practices for layer ordering and the difference between CMD
and ENTRYPOINT.
Lab: DAG pipeline
Express a four-step pipeline as both a Makefile and a Prefect flow, run both, and compare the developer experience of each approach.
Writing a Dockerfile
Write a production-quality Dockerfile for a Python automation script — from base image selection to ENTRYPOINT — with correct layer ordering to maximise cache reuse.