Lab: practical matching
Apply the full beginner toolkit — character classes, quantifiers, anchors, groups, flags, and common patterns — to realistic text challenges.
- Extract structured fields from a realistic text blob
- Combine anchors, classes, quantifiers, and flags in a single pattern
- Adapt known patterns to slight variations in format
Optional lab. This is the capstone lab for the beginner tier. Work through the checkpoints in order — later ones build on earlier patterns. Experiment freely; you can reset the code at any time.
The scenario
You are processing entries from a raw event log. Here is a sample line:
2024-03-15T08:42:11Z [WARN] user.profile: login attempt for bob@example.org from 10.0.1.23 (attempt 3 of 5)Each line has: an ISO timestamp, a log level, a component name, a message, and embedded structured data. Your job is to write functions that extract pieces of it reliably.
Warm up — explore the data
Checkpoint 1 — extract the timestamp
Parse the ISO timestamp from each log line. The format is YYYY-MM-DDTHH:MM:SSZ.
Write getTimestamp(line) returning the ISO timestamp string (like "2024-03-15T08:42:11Z") from a log line, or null if not found.
getTimestamp('2024-03-15T08:42:11Z [WARN] msg') → '2024-03-15T08:42:11Z'Checkpoint 2 — extract the log level
Pull the log level (INFO, WARN, ERROR, or DEBUG) from inside the square
brackets.
Write getLevel(line) returning the log level string ("INFO", "WARN", "ERROR", or "DEBUG") from a line like "... [WARN] ...", or null if not found.
getLevel('[WARN] login attempt') → 'WARN'Checkpoint 3 — extract the email address
Pull the email address from the log line. Emails contain @ and at least one dot
in the domain.
Write getEmail(line) returning the email address found in the line, or null if none. Use a pragmatic email pattern.
getEmail('login for bob@example.org from') → 'bob@example.org'Checkpoint 4 — extract the IP address
Pull the IPv4 address. Each octet is 1–3 digits, separated by dots.
Write getIP(line) returning the IPv4 address found in the line, or null if none.
getIP('from 10.0.1.23 attempt') → '10.0.1.23'Checkpoint 5 — count WARN and ERROR lines
Given an array of log lines, return the count of lines whose level is WARN or
ERROR.
Write countAlerts(lines) returning the number of lines whose log level is WARN or ERROR.
countAlerts(['[WARN] a', '[ERROR] b', '[INFO] c']) → 2Done?
All five green? You have extracted structured data from raw text using only the beginner regex toolkit. That skill — combined with a text editor's search-and- replace — can process log files, config dumps, and data exports that would take hours with manual methods.
Next: the intermediate tier introduces capturing groups, named groups, backreferences, and lookarounds — the tools for extraction and transformation that go beyond simple matching.
Common patterns
A practical catalogue of real-world regex patterns for emails, URLs, dates, phone numbers, and more — with honest caveats about what regex can and cannot do.
Capturing groups
Use parentheses to capture matched text for extraction and reuse — the fundamental mechanism behind regex-based data extraction.