Parsing JSON and CSV
Python's built-in json and csv modules turn raw data strings into dictionaries you can filter, transform, and write back out.
- Parse a JSON string with json.loads() into a Python dict or list
- Parse a CSV string with csv.DictReader into a list of dicts
- Write Python data back to a JSON string with json.dumps()
- Recognise that the same logical data looks different in each format
Python ships with modules for both formats you need most: json for hierarchical
data and csv for tabular data. Neither requires installation. Both follow the
same basic pattern: pass in a string, get back a Python data structure.
Parsing JSON
json.loads() converts a JSON string into a Python object. Objects become dicts;
arrays become lists:
import json
raw = '{"name": "Alice", "score": 42, "tags": ["fast", "reliable"]}'
data = json.loads(raw)
print(data["name"]) # Alice
print(data["tags"][0]) # fast
print(type(data)) # <class 'dict'>json.dumps() goes the other direction — Python dict to JSON string. The
indent=2 argument makes the output human-readable:
result = {"status": "done", "count": 7}
print(json.dumps(result, indent=2)){
"status": "done",
"count": 7
}Parsing CSV
csv.DictReader reads a CSV string (or file) and yields each row as a dictionary
keyed by the column headers:
import csv
import io
raw = """name,score,status
Alice,42,done
Bob,35,pending
Carol,51,done"""
reader = csv.DictReader(io.StringIO(raw))
for row in reader:
print(row)
# {'name': 'Alice', 'score': '42', 'status': 'done'}
# {'name': 'Bob', 'score': '35', 'status': 'pending'}
# {'name': 'Carol', 'score': '51', 'status': 'done'}Notice that every value is a string — CSV has no type information. If you need
score as an integer for arithmetic, convert it explicitly: int(row["score"]).
csv.DictReader gives you strings for every field, always. This trips up almost
everyone the first time: row["score"] > 40 compares strings lexicographically,
not numerically. Convert to int or float before comparing numbers.
The same data in both formats
Here is the same three-record dataset expressed first as CSV, then as JSON. Both represent identical information; the format changes only the shape of the text:
CSV:
name,score,status
Alice,42,done
Bob,35,pending
Carol,51,doneJSON:
[
{"name": "Alice", "score": 42, "status": "done"},
{"name": "Bob", "score": 35, "status": "pending"},
{"name": "Carol", "score": 51, "status": "done"}
]CSV is more compact for simple tables. JSON is unambiguous about types (42 is a
number, not a string) and extends naturally if you later need to add nested fields.
Try it
Parse CSV, filter rows, and serialise the results as JSON:
This is the core loop of data wrangling: parse, filter/transform, serialise. The specific operations change; the loop stays the same.
Where to go next
Next: lab — wrangle — read a CSV of product inventory, filter by stock level, and write the filtered results to JSON.
Structured data concepts
CSV, JSON, and plain text each have a job. Knowing which format fits which problem — and what parsing actually means — is where data work starts.
Lab: Wrangle some data
Read a CSV of product inventory, filter low-stock items, and write the results to JSON — the complete parse-transform-serialise loop.