Parsing JSON and CSV

Python's built-in json and csv modules turn raw data strings into dictionaries you can filter, transform, and write back out.

Python ships with modules for both formats you need most: json for hierarchical data and csv for tabular data. Neither requires installation. Both follow the same basic pattern: pass in a string, get back a Python data structure.

Parsing JSON

json.loads() converts a JSON string into a Python object. Objects become dicts; arrays become lists:

import json

raw = '{"name": "Alice", "score": 42, "tags": ["fast", "reliable"]}'
data = json.loads(raw)

print(data["name"])      # Alice
print(data["tags"][0])   # fast
print(type(data))        # <class 'dict'>

json.dumps() goes the other direction — Python dict to JSON string. The indent=2 argument makes the output human-readable:

result = {"status": "done", "count": 7}
print(json.dumps(result, indent=2))

{
  "status": "done",
  "count": 7
}

Parsing CSV

csv.DictReader reads a CSV string (or file) and yields each row as a dictionary keyed by the column headers:

import csv
import io

raw = """name,score,status
Alice,42,done
Bob,35,pending
Carol,51,done"""

reader = csv.DictReader(io.StringIO(raw))
for row in reader:
    print(row)
# {'name': 'Alice', 'score': '42', 'status': 'done'}
# {'name': 'Bob',   'score': '35', 'status': 'pending'}
# {'name': 'Carol', 'score': '51', 'status': 'done'}

Notice that every value is a string — CSV has no type information. If you need score as an integer for arithmetic, convert it explicitly: int(row["score"]).

csv.DictReader gives you strings for every field, always. This trips up almost everyone the first time: row["score"] > 40 compares strings lexicographically, not numerically. Convert to int or float before comparing numbers.

The same data in both formats

Here is the same three-record dataset expressed first as CSV, then as JSON. Both represent identical information; the format changes only the shape of the text:

CSV:

name,score,status
Alice,42,done
Bob,35,pending
Carol,51,done

JSON:

[
  {"name": "Alice", "score": 42, "status": "done"},
  {"name": "Bob",   "score": 35, "status": "pending"},
  {"name": "Carol", "score": 51, "status": "done"}
]

CSV is more compact for simple tables. JSON is unambiguous about types (42 is a number, not a string) and extends naturally if you later need to add nested fields.

Try it

Parse CSV, filter rows, and serialise the results as JSON:

Python — editable, runs in your browser

This is the core loop of data wrangling: parse, filter/transform, serialise. The specific operations change; the loop stays the same.

Where to go next

Next: lab — wrangle — read a CSV of product inventory, filter by stock level, and write the filtered results to JSON.

Finished reading? Mark it complete to track your progress.

Parsing JSON

Parsing CSV

The same data in both formats

Try it

Where to go next

On this page