Lab: DAG pipeline
Express a four-step pipeline as both a Makefile and a Prefect flow, run both, and compare the developer experience of each approach.
- Express a four-step pipeline as a Makefile with file-based dependencies
- Express the same pipeline as a Prefect flow with four tasks
- Compare the two approaches across incremental rebuilds, retries, and observability
The same pipeline can be modelled in many ways. This lab builds a fetch → clean → aggregate → report pipeline in both Make and Prefect, then asks you to reflect on which model fits which context. Understanding both gives you the vocabulary to choose deliberately rather than by default.
The pipeline
Four steps with clear file inputs and outputs:
| Step | Input | Output |
|---|---|---|
fetch | API URL | data/raw.json |
clean | data/raw.json | data/clean.json |
aggregate | data/clean.json | data/agg.json |
report | data/agg.json | reports/summary.txt |
Part 1 — Makefile
The Makefile below encodes the full dependency graph. Read it, then answer the questions in Checkpoint 1.
# Makefile — fetch → clean → aggregate → report pipeline
DATA := data
REPORTS := reports
.PHONY: all clean
all: $(REPORTS)/summary.txt
# Step 1: fetch raw data
$(DATA)/raw.json:
mkdir -p $(DATA)
python scripts/fetch.py --output $@
# Step 2: clean and validate
$(DATA)/clean.json: $(DATA)/raw.json
python scripts/clean.py --input $< --output $@
# Step 3: aggregate
$(DATA)/agg.json: $(DATA)/clean.json
python scripts/aggregate.py --input $< --output $@
# Step 4: generate report
$(REPORTS)/summary.txt: $(DATA)/agg.json
mkdir -p $(REPORTS)
python scripts/report.py --input $< --output $@
clean:
rm -rf $(DATA) $(REPORTS)Checkpoint 1 — incremental rebuilds
- Run
make all(assuming scripts exist). All four steps execute. - Touch
data/clean.json(simulating a manual edit):touch data/clean.json. - Run
make allagain. Which steps re-run? Which are skipped?
Expected: clean.json is newer than raw.json, so fetch is skipped. But
clean.json is newer than agg.json, so aggregate and report re-run.
This is incremental rebuilding — one of Make's core strengths.
Part 2 — Prefect flow
The Prefect version of the same pipeline is runnable in the demo below.
Checkpoint 2 — add a retry to fetch
Modify fetch to raise ConnectionError on the first call (use a module-level
counter as in the previous lesson), and add retries=2 to the @task decorator.
Re-run and confirm the retry fires and the pipeline completes.
In the Makefile version, adding retry behaviour to a single step would require wrapping the script invocation in a shell retry loop — significantly more awkward.
Checkpoint 3 — add a parallel branch
Add a fifth task, export_to_csv, that also depends on the clean output but is
independent of aggregate. In Prefect:
@task
def export_to_csv(records: list) -> str:
# write records to CSV string
...
@flow(name="revenue-pipeline-v2")
def pipeline(url: str = "https://api.example.com/revenue") -> str:
raw = fetch(url)
validated = clean(raw)
# these two calls are independent — Prefect can submit them in parallel
aggregated = aggregate(validated)
csv_path = export_to_csv(validated)
return report(aggregated)In the Makefile, you would add a new target with $(DATA)/clean.json as its
prerequisite and add it to the all target — equally clean, but without the
Python ecosystem for any logic inside the step.
Comparison
| Concern | Makefile | Prefect |
|---|---|---|
| Incremental rebuilds | First-class (timestamp comparison) | Not built-in |
| Per-step retries | Shell loop workaround | Declarative |
| Run history | None | Full UI |
| Parametrised runs | ENV variables | Typed Python args |
| Parallel execution | -j flag | Automatic (with task runner) |
| Dependencies | File timestamps | Data flow |
| Infrastructure needed | None — Make ships everywhere | None for local; server for cloud |
Choose Make when: your pipeline is file-to-file, incremental rebuilds matter, and you want zero Python dependencies in the orchestration layer.
Choose Prefect when: you need retries, observability, scheduling, or parametrised runs, and your team works primarily in Python.
These tools are not mutually exclusive. A common pattern is to use Make for the data-heavy file transformation stages (where incremental rebuilds save hours) and wrap the whole Makefile in a Prefect flow for scheduling, retries, and alerting.
Where to go next
Module complete. Next up: Containerised Workflows — packaging your hardened, tested, orchestrated pipeline into a Docker image so it runs identically in development, CI, and production.
Prefect in practice
Define a two-task Prefect flow, run it locally, and observe task states — all in ordinary Python without any external infrastructure.
Docker concepts
Images are frozen environments, containers are running instances, and layers make rebuilds fast — understand the model before writing a single Dockerfile.