Exit codes and errors
Exit codes are a program's last word to its caller — and a program that lies about success is dangerous in any pipeline.
- Explain what exit codes are and what 0 vs. non-zero means
- Describe when to exit with an error versus continuing
- Understand why silent failures are especially dangerous in automated pipelines
When a program finishes, it leaves behind a single integer: its exit code. The operating system, the shell, and any script or pipeline that invoked the program use that number to decide what to do next. It is the program's last word to its caller, and it is the only channel left once stdout and stderr are closed.
The convention
The convention is simple and universal:
- 0 means success. The program did what it was supposed to do.
- Any non-zero value means failure. Something went wrong.
The shell exposes the last exit code as $?. Scripts can branch on it:
if grep -q "ERROR" log.txt; then .... Pipeline operators like && and ||
use it to decide whether to run the next command. A pipeline with set -e
aborts the moment any step exits non-zero.
Common conventions for the non-zero values:
| Code | Meaning |
|---|---|
| 1 | General error — something went wrong |
| 2 | Usage error — bad arguments, missing required input |
| 127 | Command not found (set by the shell, not your program) |
You are not required to use 1 and 2 specifically, but they are widely understood. Staying close to convention makes your tool easier to integrate into other people's scripts.
When to exit with an error
Exit non-zero whenever the program cannot produce correct output:
- A required input file does not exist
- The input is malformed in a way the program cannot recover from
- A required argument is missing or invalid
- An unexpected exception occurs
Exit zero only when the program finished and the output is trustworthy. If you caught an exception, printed a message, and returned something that might be garbage — that is still an error. Exit non-zero.
Silent failures are the worst kind
A silent failure is a program that encounters an error, prints nothing (or prints the error to stdout rather than stderr), and exits 0. It looks like success. Everything downstream proceeds — and eventually produces wrong results with no trace of why.
The most common source of silent failures is an overly broad exception handler:
try:
result = process(data)
except Exception:
pass # <- this is a lie to every caller you will ever haveThat pass swallows the error, discards the result, and tells the caller
everything went fine. In an automated pipeline running unattended, this can
corrupt data for hours before anyone notices.
The correct pattern: catch what you can handle, re-raise or exit what you cannot.
try:
result = process(data)
except ValueError as e:
sys.stderr.write(f"bad input: {e}\n")
sys.exit(1)Never use a bare except: pass in a utility. If something goes wrong that
you did not anticipate, let the program crash loudly with a traceback. A crash
with a traceback is infinitely easier to debug than a silent wrong answer.
Where to go next
Next: building a CLI tool — putting argparse, stdout/stderr, and exit codes together into a complete, working utility.