Streaming in practice
Rewrite a list-buffering file processor as a generator pipeline and measure the memory difference.
- Rewrite a list-buffering function as a generator
- Verify constant memory usage with a large generated input
- Compose generator stages into a pipeline
Theory is easy to accept; the memory numbers make it concrete. The demo below generates 100,000 lines in memory, runs both the buffering and streaming versions, and shows the size difference directly.
Side-by-side comparison
Run both versions and observe the allocation sizes:
The generator object itself is around 100 bytes regardless of input size. The list version allocated tens of megabytes to hold every transformed string before the caller could read a single one.
Building a pipeline
Generators compose by passing one into another. Each stage is a function that takes an iterable and yields transformed values:
Python's built-in map() and filter() are also lazy — they return iterators,
not lists. map(str.upper, lines) is a streaming equivalent of the list
comprehension [line.upper() for line in lines]. Use them when the
transformation is a single function call; use explicit generators when you need
multiple statements or conditional logic.
When to reach for itertools
The itertools module in the standard library provides streaming combinators:
chain() (concatenate iterables), islice() (take the first N items),
groupby() (group consecutive items), and tee() (fork a single iterator into
two). Reaching for these before writing your own loop often produces both cleaner
and more memory-efficient code.
Where to go next
Next: memory profiling — using tracemalloc to find the specific lines
responsible for peak allocations, so you know exactly where to apply the
streaming refactor.