Code of the Day
AdvancedTesting Automation Scripts

Lab: test suite

Write a complete test suite for a pipeline script — covering config loading, data transformation, mocked HTTP, file output, and error paths.

Lab · optionalWorkflowAdvanced35 min
By the end of this lesson you will be able to:
  • Write a test for config loading that catches missing required keys
  • Test a data transformation function with known inputs and expected outputs
  • Test an HTTP fetch function with a mocked transport
  • Test file output with a temporary directory
  • Test an error path to confirm exceptions surface rather than being swallowed

A pipeline without tests is a time bomb. This lab provides a small but realistic pipeline module and guides you through writing five tests — one for each layer that matters. By the end, every function in the module has at least one test, and every error path is covered.

The pipeline module under test

Read through the module carefully before writing tests. Note:

  • load_config reads from a dict (simulating os.environ) and raises on missing keys.
  • transform_records is pure — it only computes.
  • fetch_records calls an external URL.
  • write_report writes JSON to disk.
  • run is the orchestrator — it calls all four in sequence.
Python — editable, runs in your browser

Checkpoint 1 — break a test intentionally

Change the transform_records function so it does not drop negative values. Run the tests. test_transform_records should fail with a clear message. Restore the filter and confirm all tests pass again.

This is the most important habit: verify that tests actually catch the bug they claim to catch, not just that they pass when the code is correct.

Checkpoint 2 — test the orchestrator

run calls four functions in sequence. Add a test that:

  1. Mocks fetch_records to return a list of two records.
  2. Uses a tmp_path-style temporary directory for OUTPUT_DIR.
  3. Calls run(env) with appropriate env values.
  4. Asserts the output file exists and contains the transformed data.

This is an integration test within the module — it exercises all four functions together without hitting the real network.

An integration test that mocks only the network boundary (not the filesystem) is often the most valuable test you can write. It proves that the pieces fit together, not just that each piece works in isolation.

Checkpoint 3 — add a missing-file error path

Modify write_report to raise PermissionError if output_dir is /root (always unwritable on a standard Linux system). Write a test that:

  1. Calls write_report(records, Path("/root")).
  2. Asserts the PermissionError is raised.

Then revert the change. The point is to practise writing a test before writing the code — red first, then green.

Converting to real pytest

When you run these as a proper pytest suite, replace the manual if/else checks with assert statements and let pytest handle the output formatting:

def test_transform_records():
    raw = [{"id": "a", "value": "10"}, {"id": "b", "value": "-5"}]
    result = transform_records(raw)
    assert len(result) == 1
    assert result[0]["value"] == 10

def test_write_report_file_contents(tmp_path):
    records = [{"id": "a", "value": 42}]
    dest = write_report(records, tmp_path)
    assert dest.exists()
    assert json.loads(dest.read_text()) == records

Note how tmp_path comes in as a pytest fixture — no tempfile.TemporaryDirectory context manager needed.

Where to go next

Module complete. Next up: Workflow Orchestration — once your pipelines are tested and hardened, the next step is expressing their dependencies explicitly as DAGs and running them with a proper orchestration tool.

Finished reading? Mark it complete to track your progress.

On this page