Code of the Day
AdvancedWorkflow Orchestration

Lab: DAG pipeline

Express a four-step pipeline as both a Makefile and a Prefect flow, run both, and compare the developer experience of each approach.

Lab · optionalWorkflowAdvanced35 min
By the end of this lesson you will be able to:
  • Express a four-step pipeline as a Makefile with file-based dependencies
  • Express the same pipeline as a Prefect flow with four tasks
  • Compare the two approaches across incremental rebuilds, retries, and observability

The same pipeline can be modelled in many ways. This lab builds a fetch → clean → aggregate → report pipeline in both Make and Prefect, then asks you to reflect on which model fits which context. Understanding both gives you the vocabulary to choose deliberately rather than by default.

The pipeline

Four steps with clear file inputs and outputs:

StepInputOutput
fetchAPI URLdata/raw.json
cleandata/raw.jsondata/clean.json
aggregatedata/clean.jsondata/agg.json
reportdata/agg.jsonreports/summary.txt

Part 1 — Makefile

The Makefile below encodes the full dependency graph. Read it, then answer the questions in Checkpoint 1.

# Makefile — fetch → clean → aggregate → report pipeline

DATA    := data
REPORTS := reports

.PHONY: all clean

all: $(REPORTS)/summary.txt

# Step 1: fetch raw data
$(DATA)/raw.json:
	mkdir -p $(DATA)
	python scripts/fetch.py --output $@

# Step 2: clean and validate
$(DATA)/clean.json: $(DATA)/raw.json
	python scripts/clean.py --input $< --output $@

# Step 3: aggregate
$(DATA)/agg.json: $(DATA)/clean.json
	python scripts/aggregate.py --input $< --output $@

# Step 4: generate report
$(REPORTS)/summary.txt: $(DATA)/agg.json
	mkdir -p $(REPORTS)
	python scripts/report.py --input $< --output $@

clean:
	rm -rf $(DATA) $(REPORTS)

Checkpoint 1 — incremental rebuilds

  1. Run make all (assuming scripts exist). All four steps execute.
  2. Touch data/clean.json (simulating a manual edit): touch data/clean.json.
  3. Run make all again. Which steps re-run? Which are skipped?

Expected: clean.json is newer than raw.json, so fetch is skipped. But clean.json is newer than agg.json, so aggregate and report re-run.

This is incremental rebuilding — one of Make's core strengths.

Part 2 — Prefect flow

The Prefect version of the same pipeline is runnable in the demo below.

Python — editable, runs in your browser

Checkpoint 2 — add a retry to fetch

Modify fetch to raise ConnectionError on the first call (use a module-level counter as in the previous lesson), and add retries=2 to the @task decorator. Re-run and confirm the retry fires and the pipeline completes.

In the Makefile version, adding retry behaviour to a single step would require wrapping the script invocation in a shell retry loop — significantly more awkward.

Checkpoint 3 — add a parallel branch

Add a fifth task, export_to_csv, that also depends on the clean output but is independent of aggregate. In Prefect:

@task
def export_to_csv(records: list) -> str:
    # write records to CSV string
    ...

@flow(name="revenue-pipeline-v2")
def pipeline(url: str = "https://api.example.com/revenue") -> str:
    raw        = fetch(url)
    validated  = clean(raw)
    # these two calls are independent — Prefect can submit them in parallel
    aggregated = aggregate(validated)
    csv_path   = export_to_csv(validated)
    return report(aggregated)

In the Makefile, you would add a new target with $(DATA)/clean.json as its prerequisite and add it to the all target — equally clean, but without the Python ecosystem for any logic inside the step.

Comparison

ConcernMakefilePrefect
Incremental rebuildsFirst-class (timestamp comparison)Not built-in
Per-step retriesShell loop workaroundDeclarative
Run historyNoneFull UI
Parametrised runsENV variablesTyped Python args
Parallel execution-j flagAutomatic (with task runner)
DependenciesFile timestampsData flow
Infrastructure neededNone — Make ships everywhereNone for local; server for cloud

Choose Make when: your pipeline is file-to-file, incremental rebuilds matter, and you want zero Python dependencies in the orchestration layer.

Choose Prefect when: you need retries, observability, scheduling, or parametrised runs, and your team works primarily in Python.

These tools are not mutually exclusive. A common pattern is to use Make for the data-heavy file transformation stages (where incremental rebuilds save hours) and wrap the whole Makefile in a Prefect flow for scheduling, retries, and alerting.

Where to go next

Module complete. Next up: Containerised Workflows — packaging your hardened, tested, orchestrated pipeline into a Docker image so it runs identically in development, CI, and production.

Finished reading? Mark it complete to track your progress.

On this page