Writing a Dockerfile
Write a production-quality Dockerfile for a Python automation script — from base image selection to ENTRYPOINT — with correct layer ordering to maximise cache reuse.
- Write a Dockerfile using FROM, WORKDIR, COPY, RUN, and ENTRYPOINT
- Order COPY and RUN instructions to maximise Docker layer cache reuse
- Explain the difference between ENTRYPOINT and CMD and when to use each
A Dockerfile is a recipe. Each instruction adds a layer to the image. The goal is to produce an image that is: small enough to pull quickly, deterministic (same inputs → same image), and ordered so that the expensive steps are cached aggressively.
A minimal Dockerfile for a workflow script
# syntax=docker/dockerfile:1
# ── Base image ────────────────────────────────────────────────────────────────
# python:3.12-slim is Debian-based with only the Python runtime — no build tools.
# Pin to a digest in production: python:3.12-slim@sha256:<digest>
FROM python:3.12-slim
# ── Working directory ─────────────────────────────────────────────────────────
# All subsequent COPY / RUN commands execute relative to this path.
WORKDIR /app
# ── Dependencies — copied first to exploit layer caching ─────────────────────
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# ── Source code ───────────────────────────────────────────────────────────────
COPY . .
# ── Non-root user (security best practice) ───────────────────────────────────
RUN useradd --system --no-create-home pipeline
USER pipeline
# ── Entry point ───────────────────────────────────────────────────────────────
ENTRYPOINT ["python", "pipeline.py"]Instruction by instruction
FROM python:3.12-slim selects the base image. slim variants omit
development headers and documentation, reducing image size significantly. Use
alpine-based images only if you are confident your packages do not need C
extensions — many do, and the build will fail silently with cryptic import errors.
WORKDIR /app creates the directory and sets it as the working directory for
every subsequent instruction. Prefer this over RUN mkdir && cd — it is explicit
and idempotent.
COPY requirements.txt . copies only the requirements file first.
RUN pip install --no-cache-dir -r requirements.txt installs dependencies.
--no-cache-dir prevents pip from writing its HTTP cache to the layer, keeping
the image smaller. This layer is cached until requirements.txt changes.
COPY . . copies all remaining source files. This is placed after the pip
install so that code changes do not invalidate the dependency layer.
USER pipeline drops root privileges before the process starts. Running
automation scripts as root inside a container is a security risk — if the script
is compromised, it has root in the container namespace.
ENTRYPOINT ["python", "pipeline.py"] uses the exec form (JSON array), which
makes python PID 1 and ensures signals are delivered correctly.
ENTRYPOINT vs CMD
ENTRYPOINT sets the executable. CMD provides default arguments that can be
overridden at docker run time. Use both together when you want the script to
accept optional arguments:
ENTRYPOINT ["python", "pipeline.py"]
CMD ["--env", "production"]Running docker run myimage --env staging overrides CMD but keeps the
ENTRYPOINT, resulting in python pipeline.py --env staging.
Avoid the shell form of ENTRYPOINT: ENTRYPOINT python pipeline.py. In the
shell form, a wrapper shell (/bin/sh -c) becomes PID 1 and Python runs as a
child process. Signals like SIGTERM go to the shell, not Python — your
graceful shutdown handlers will never fire.
Building and running
# Build the image and tag it
docker build -t my-pipeline:latest .
# Run with a volume mount for output files
docker run --rm \
-v "$(pwd)/output:/app/output" \
-e API_KEY="$API_KEY" \
my-pipeline:latestThe -v flag mounts ./output on the host into /app/output in the container,
so output files survive after the container exits. -e passes the API_KEY
environment variable without baking it into the image.
Where to go next
Next: Docker Compose workflows — when your pipeline needs a database, a file server, or any other service alongside it, Compose defines the whole environment in a single YAML file.
Docker concepts
Images are frozen environments, containers are running instances, and layers make rebuilds fast — understand the model before writing a single Dockerfile.
Docker Compose workflows
Docker Compose orchestrates multi-service environments in a single YAML file — define your pipeline, its database, and any supporting services together so the whole stack runs with one command.