Code of the Day
IntermediateShell and Processes

Python subprocess pipeline

Chain subprocess calls, process the output in Python between steps, and write the final result to a file — a complete worked example.

WorkflowIntermediate10 min read
Recommended first
By the end of this lesson you will be able to:
  • Chain two subprocess calls with Python processing in between
  • Filter and transform captured output using standard Python
  • Write the final processed output to a file

The previous lesson explained when to use Python between subprocess calls instead of direct shell pipes. This lesson shows the pattern in a complete, runnable example: get a directory listing, filter it with Python logic, and write the result to a file.

The pattern

The structure of any Python-mediated pipeline is the same:

  1. Run the first command and capture its output.
  2. Process the output in Python (filter, transform, aggregate).
  3. Either write to a file or pass the result to the next command.

The advantage over a shell pipe is that step 2 can use dictionaries, regular expressions, external libraries, conditionals, and anything else Python offers.

A worked example

The example below:

  • Uses os.listdir() to get directory contents (equivalent to running ls)
  • Filters for Python files in Python (the "in between" step)
  • Writes the filtered list to a file
import os
import subprocess
import io

# Step 1: Get directory contents
# In a real script this might be subprocess.run(["find", ".", "-name", "*.py"])
# Here we use os.listdir() which works the same way in terms of data flow
files = os.listdir(".")

# Step 2: Filter in Python — keep only .py files
py_files = [f for f in sorted(files) if f.endswith(".py")]

# Step 3: Write to a file
with open("py_files.txt", "w") as f:
    for name in py_files:
        f.write(name + "\n")

print(f"Found {len(py_files)} Python files")

The filtering step is the key: arbitrary Python logic decides what passes through. You can check file sizes, look up metadata, apply regex patterns — none of which you could do cleanly in a bare shell pipe.

Try it

The runner below chains an actual subprocess call to Python processing and writes to an in-memory buffer:

Python — editable, runs in your browser

Notice the three steps are clearly separated in the code. If something goes wrong, the structure tells you exactly where to look: did the command fail (step 1), did the filter produce wrong results (step 2), or did the file write fail (step 3)?

Passing Python output to a second command

When the second step is another subprocess rather than a file, use the input argument:

import subprocess

# Step 1: generate data in Python
data = "\n".join(["cherry", "apple", "banana", "date"])

# Step 2: pipe the Python string into an external command
result = subprocess.run(
    ["sort"],
    input=data,
    capture_output=True,
    text=True,
    check=True,
)

print(result.stdout)
# apple
# banana
# cherry
# date

input=data is the Python equivalent of echo "$data" | sort. The subprocess reads the string as if it came from stdin.

The input argument and capture_output=True cannot be combined with stdin=subprocess.PIPE — they are different ways of providing stdin. input is the high-level convenience; stdin=PIPE is for streaming.

Where to go next

Next: lab — subprocess pipeline — a longer exercise building a script that generates file metadata with subprocess, aggregates it in Python, and writes a JSON summary report.

Finished reading? Mark it complete to track your progress.

On this page