Code of the Day
AdvancedPerformance and Streaming

Profiling in practice

Run tracemalloc on a buffering function and its generator replacement, and compare the top allocations.

UtilitiesAdvanced8 min read
Recommended first
By the end of this lesson you will be able to:
  • Use tracemalloc to identify the largest allocations by source line
  • Compare peak memory between a buffering function and a generator
  • Interpret tracemalloc statistics output

The goal is to see tracemalloc output for both the buffering and streaming versions of the same function, and learn to read that output confidently.

Profiling the buffering version

Python — editable, runs in your browser

The top entries will point directly to lines 7 and 8 — the two list comprehensions. Each one allocates a full copy of the 50,000-item dataset. Peak memory is roughly twice the current memory because both lists existed simultaneously at the moment of peak allocation.

Profiling the streaming version

Python — editable, runs in your browser

Peak memory drops dramatically. The allocations that tracemalloc reports are now the generator machinery and the loop variable — a few kilobytes, not megabytes. The same 50,000 records were processed; none of them accumulate.

Reading the statistics output

A typical stat line:

<string>:7: size=4.6 MiB, count=50000, average=96 B
  • size — total live allocation from this line at snapshot time.
  • count — number of objects still alive.
  • average — mean object size; useful for distinguishing "one big allocation" from "many small ones".

When count is close to your input size (50,000 records → 50,000 objects), you have found the buffering point. When average is a few hundred bytes, each object is a string. Both together confirm a list-of-strings accumulation.

Use snapshot.statistics("traceback") instead of "lineno" when you need the full call stack for an allocation. This is slower but essential when the allocation happens inside a library function and you need to know which of your call sites triggered it.

Where to go next

Next: lab — optimise utility — a provided file processor with a memory hotspot. Profile it, identify the culprit, convert to a generator pipeline, and confirm it can handle 1 million lines without running out of memory.

Finished reading? Mark it complete to track your progress.

On this page