Code of the Day
AdvancedContainerised Workflows

Writing a Dockerfile

Write a production-quality Dockerfile for a Python automation script — from base image selection to ENTRYPOINT — with correct layer ordering to maximise cache reuse.

WorkflowAdvanced10 min read
Recommended first
By the end of this lesson you will be able to:
  • Write a Dockerfile using FROM, WORKDIR, COPY, RUN, and ENTRYPOINT
  • Order COPY and RUN instructions to maximise Docker layer cache reuse
  • Explain the difference between ENTRYPOINT and CMD and when to use each

A Dockerfile is a recipe. Each instruction adds a layer to the image. The goal is to produce an image that is: small enough to pull quickly, deterministic (same inputs → same image), and ordered so that the expensive steps are cached aggressively.

A minimal Dockerfile for a workflow script

# syntax=docker/dockerfile:1

# ── Base image ────────────────────────────────────────────────────────────────
# python:3.12-slim is Debian-based with only the Python runtime — no build tools.
# Pin to a digest in production: python:3.12-slim@sha256:<digest>
FROM python:3.12-slim

# ── Working directory ─────────────────────────────────────────────────────────
# All subsequent COPY / RUN commands execute relative to this path.
WORKDIR /app

# ── Dependencies — copied first to exploit layer caching ─────────────────────
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# ── Source code ───────────────────────────────────────────────────────────────
COPY . .

# ── Non-root user (security best practice) ───────────────────────────────────
RUN useradd --system --no-create-home pipeline
USER pipeline

# ── Entry point ───────────────────────────────────────────────────────────────
ENTRYPOINT ["python", "pipeline.py"]

Instruction by instruction

FROM python:3.12-slim selects the base image. slim variants omit development headers and documentation, reducing image size significantly. Use alpine-based images only if you are confident your packages do not need C extensions — many do, and the build will fail silently with cryptic import errors.

WORKDIR /app creates the directory and sets it as the working directory for every subsequent instruction. Prefer this over RUN mkdir && cd — it is explicit and idempotent.

COPY requirements.txt . copies only the requirements file first.

RUN pip install --no-cache-dir -r requirements.txt installs dependencies. --no-cache-dir prevents pip from writing its HTTP cache to the layer, keeping the image smaller. This layer is cached until requirements.txt changes.

COPY . . copies all remaining source files. This is placed after the pip install so that code changes do not invalidate the dependency layer.

USER pipeline drops root privileges before the process starts. Running automation scripts as root inside a container is a security risk — if the script is compromised, it has root in the container namespace.

ENTRYPOINT ["python", "pipeline.py"] uses the exec form (JSON array), which makes python PID 1 and ensures signals are delivered correctly.

ENTRYPOINT vs CMD

ENTRYPOINT sets the executable. CMD provides default arguments that can be overridden at docker run time. Use both together when you want the script to accept optional arguments:

ENTRYPOINT ["python", "pipeline.py"]
CMD ["--env", "production"]

Running docker run myimage --env staging overrides CMD but keeps the ENTRYPOINT, resulting in python pipeline.py --env staging.

Avoid the shell form of ENTRYPOINT: ENTRYPOINT python pipeline.py. In the shell form, a wrapper shell (/bin/sh -c) becomes PID 1 and Python runs as a child process. Signals like SIGTERM go to the shell, not Python — your graceful shutdown handlers will never fire.

Building and running

# Build the image and tag it
docker build -t my-pipeline:latest .

# Run with a volume mount for output files
docker run --rm \
  -v "$(pwd)/output:/app/output" \
  -e API_KEY="$API_KEY" \
  my-pipeline:latest

The -v flag mounts ./output on the host into /app/output in the container, so output files survive after the container exits. -e passes the API_KEY environment variable without baking it into the image.

Where to go next

Next: Docker Compose workflows — when your pipeline needs a database, a file server, or any other service alongside it, Compose defines the whole environment in a single YAML file.

Finished reading? Mark it complete to track your progress.

On this page