Idempotency in practice
A script is production-ready only when running it twice leaves the world in the same state as running it once. Learn what idempotency means and the two patterns that enforce it.
- Define idempotency precisely — f(f(x)) = f(x) — and explain why it matters for automation
- Apply checkpoint files to skip steps that already completed
- Use atomic writes to prevent partial outputs from being read by downstream steps
Automation scripts run unattended, often on a schedule, sometimes on retry after a
crash. The property that makes them safe to re-run without supervision is
idempotency: applying the same operation multiple times produces the same result
as applying it once. In mathematical notation, f(f(x)) = f(x).
A script that is not idempotent will double-count rows, re-send emails, or corrupt output files if it is interrupted and restarted. Building idempotency in from the start is far easier than retrofitting it later.
Why idempotency matters
Consider a pipeline with three steps: fetch data from an API, transform it, upload the result to a data warehouse. If the upload fails halfway through, what happens when the pipeline restarts?
Without idempotency: the fetch and transform run again from scratch, duplicate rows enter the warehouse, and you spend hours cleaning up. With idempotency: the pipeline detects that fetch and transform already completed, skips to the upload, and retries only the failed step.
The two patterns that enforce idempotency in practice are checkpoints and atomic writes.
Checkpoint files
A checkpoint is a marker that records the completion of a step. Before each step, check for its marker. If it exists, skip the step. If it does not, run the step and create the marker on success.
import os
from pathlib import Path
CHECKPOINT_DIR = Path(".checkpoints")
CHECKPOINT_DIR.mkdir(exist_ok=True)
def step_done(name: str) -> bool:
return (CHECKPOINT_DIR / f"{name}.done").exists()
def mark_done(name: str) -> None:
(CHECKPOINT_DIR / f"{name}.done").touch()
def run_pipeline():
if not step_done("fetch"):
fetch_data()
mark_done("fetch")
if not step_done("transform"):
transform_data()
mark_done("transform")
if not step_done("upload"):
upload_results()
mark_done("upload")The checkpoint directory should be in .gitignore. Include a timestamp or run ID
in the filename if you want separate checkpoints per run.
Atomic writes
Even a step that runs only once can produce corrupt output if the process is killed while writing. A downstream step that reads a half-written CSV will fail or, worse, silently process garbage data.
The fix is the write-then-rename pattern. Write to a temporary file alongside
the destination; once the write completes, rename the temp file to the final
filename. On POSIX filesystems, rename() is atomic — it either succeeds entirely
or fails with the original file untouched.
import os
import tempfile
from pathlib import Path
def write_atomically(destination: Path, content: str) -> None:
dir_ = destination.parent
# NamedTemporaryFile with delete=False writes to the same directory,
# which is required for rename to be atomic on most filesystems.
with tempfile.NamedTemporaryFile(
mode="w", dir=dir_, suffix=".tmp", delete=False
) as tmp:
tmp.write(content)
tmp_path = tmp.name
os.replace(tmp_path, destination) # atomic on POSIX; overwrites if existsos.replace() is preferable to os.rename() because it works even when the
destination already exists — which matters when you are overwriting output from a
previous partial run.
On Windows, os.replace() is not guaranteed atomic, but it is the closest
available approximation. For cross-platform pipelines that require strict
atomicity, write to a temporary directory on the same filesystem, then move.
Where to go next
Next: checkpoints and atomic writes — putting both patterns together in a runnable example you can adapt to your own pipeline.
Lab: Scheduled job
Build a configured, logged script that fetches data from a public API, writes a JSON report, and is wired to run on a schedule — end-to-end practice for the scheduling and configuration module.
Checkpoints and atomic writes
Implement the two core idempotency patterns in Python — checkpoint marker files and atomic write-then-rename — so your pipelines survive crashes and restarts cleanly.