Subprocess fundamentals
Python's subprocess module lets your scripts run external commands — understanding the difference between run() and Popen, and why shell=True is usually wrong, keeps your scripts safe and predictable.
- Explain the difference between subprocess.run() and subprocess.Popen
- Understand what capture_output=True does vs passing output through to the terminal
- Explain why shell=True introduces risk and when it should be avoided
Every operating system ships with useful command-line tools: git, ffmpeg, grep,
image converters, database CLI clients. Python's subprocess module is the bridge that
lets your scripts invoke these tools without leaving Python. You get the best of both
worlds — Python for data processing and control flow, existing tools for what they do best.
subprocess.run(): run and wait
The standard call for most situations:
import subprocess
result = subprocess.run(["echo", "hello"])
print(result.returncode) # 0 means successsubprocess.run() launches the process, waits for it to finish, and returns a
CompletedProcess object. Your Python code is blocked until the command exits. For the
vast majority of automation tasks — running a compiler, converting a file, invoking a
CLI tool — this is exactly the behaviour you want.
subprocess.Popen(): launch and continue
Popen is the lower-level primitive. It launches a process and returns immediately
without waiting. Your Python code continues executing while the subprocess runs in
parallel. You call .wait() or .communicate() when you need the result:
proc = subprocess.Popen(["sleep", "2"])
print("Subprocess started — Python keeps running")
proc.wait()
print("Subprocess finished")Use Popen when you need to run processes in parallel, stream output as it arrives, or
build a pipeline between two processes (covered in the pipelines lesson). For everything
else, run() is simpler.
Capturing output
By default, a subprocess's stdout and stderr go straight to the terminal — the same
place Python prints to. If you want to capture that output and do something with it in
Python, add capture_output=True:
result = subprocess.run(
["echo", "hello"],
capture_output=True,
text=True, # decode bytes to str automatically
)
print(result.stdout) # "hello\n"Without text=True, result.stdout is a bytes object. Adding text=True (or
equivalently encoding="utf-8") saves you from manually calling .decode().
capture_output=True is shorthand for stdout=subprocess.PIPE, stderr=subprocess.PIPE. The longer form is useful when you want to capture stdout
but let stderr pass through to the terminal for debugging.
Why shell=True is usually wrong
You might see code like this:
subprocess.run("echo hello", shell=True) # avoid thisWith shell=True, Python passes the entire string to the system shell (/bin/sh),
which interprets it. This introduces two problems.
First, injection risk: if any part of the command string comes from user input or
an external source, a malicious value can run arbitrary commands. shell=True with
constructed strings is a classic security vulnerability.
Second, portability: shell behaviour differs between sh, bash, and Windows
cmd.exe. The list form is unambiguous.
Pass a list of strings instead:
subprocess.run(["echo", "hello"]) # safe and portableEach list element is passed directly to the OS without shell interpretation. Spaces, quotes, and special characters in arguments are handled correctly because there is no shell to misinterpret them.
The only legitimate use of shell=True is when you are running a short, fully
hardcoded shell pipeline that cannot be expressed with the list API. Even then,
think twice — the pipelines lesson shows how to replicate pipes in Python without
the shell.
Where to go next
Next: subprocess in practice — a runnable example showing check=True,
.stdout, .returncode, and what happens when a command fails.
Lab: API report
Fetch all todos from a public API, aggregate them by user, and write a plain-text summary report — end-to-end practice for the external integrations module.
Subprocess in practice
Run external commands from Python, capture their output, check return codes, and raise immediately on failure with check=True.