Lab: build a utility
Build a CSV column extractor from scratch — read structured input, apply a transformation, write clean output with proper error handling.
- Build a utility from scratch using the stdin/stdout/argparse/exit-code pattern
- Parse CSV input and extract named columns
- Write clean, composable output with proper error handling
This is an optional lab. No new concepts — just hands-on practice applying everything from the Utility Thinking module. Work through it at your own pace. Each checkpoint has a Check button so you can test as you go.
You are going to build a CSV column extractor: a utility that takes CSV text as input, extracts one or more named columns, and writes the values as plain lines to stdout. It is small enough to finish in a session and real enough that you could actually use it.
Here is what it will do when finished:
# Input (stdin):
name,score,city
alice,91,london
bob,74,paris
carol,88,london
# Command:
python extract.py score
# Output (stdout):
score
91
74
88One column name as a positional argument. Clean lines out. Errors to stderr. Exit 1 if the column does not exist.
Warm up: parse one CSV row
Before the graded work, make sure you can split a CSV line. Python's str.split
is not safe for CSV (it breaks on commas inside quoted fields), but for simple
data without embedded commas, line.split(",") is fine. For real CSV, use the
csv module.
Play with this before moving on:
Notice how the header gives you the column names and each data row is a parallel list of values. To extract column "score", you find its index in the header, then pull that index from every data row.
Checkpoint 1 — find a column index
Write column_index(header, name) that returns the integer index of name in
the header list, or -1 if it is not found.
Write column_index(header, name) that returns the index of name in header, or -1 if not found.
column_index(["name","score","city"], "score") → 1column_index(["name","score","city"], "age") → -1The built-in list.index() raises a ValueError if the item is absent — you
will need to handle that, or use a different approach. A try/except around
list.index() is one clean option.
Checkpoint 2 — extract a column
Write extract_column(csv_text, column_name) that returns a list of values for
the named column. The first element should be the column name itself (the header),
followed by one value per data row. If the column does not exist, return an empty
list.
Write extract_column(csv_text, column_name) that returns a list: [header_value, row1_value, row2_value, ...]. Return [] if the column is not found.
extract_column(csv, "score") → ["score", "91", "74", "88"]extract_column(csv, "age") → []Checkpoint 3 — wire up the utility
Now put it all together. Write the complete run_extractor(csv_text, column_name)
function that:
- Returns the extracted lines joined by newlines when the column exists
- Returns
Nonewhen the column does not exist (the caller should then exit 1)
Write run_extractor(csv_text, column_name) that returns the column values joined by newlines, or None if the column does not exist.
run_extractor(csv, "score") → "score\n91\n74"run_extractor(csv, "age") → NonePutting it all together
Here is a complete utility that uses your functions. Notice how short main()
is — all the real work is in the helper functions that you just tested:
Try changing parse_args(["score"]) to parse_args(["city"]) or
parse_args(["age"]) to see the success and failure paths.
Where to go next
Your utility works — and it is composable. Next, the packaging and sharing module covers how to turn this script into an installable command that you can run by name from anywhere.