Parsing log files
Build a complete Apache/Nginx access log parser step by step using named groups, then extract multiple fields with matchAll.
- Read and understand the structure of a standard access log line
- Write named-group patterns for IP address, HTTP method, path, status code, and response bytes
- Compose those patterns into a single combined parser
- Use matchAll to process multiple log lines and collect structured results
Log files are one of the most common places developers reach for regex. A single day of web traffic can produce millions of lines, each in a well-known but not-quite-CSV format. Parsing them with split is fragile; a dedicated log parser is heavy. A well-crafted regex with named groups threads the needle: precise, readable, and fast enough for batch processing.
The access log format
A standard Apache/Nginx combined log line looks like this:
192.168.1.42 - frank [12/Jun/2024:13:45:22 +0000] "GET /api/users HTTP/1.1" 200 1523 "https://example.com/dashboard" "Mozilla/5.0"The fields in order:
| Position | Field | Example |
|---|---|---|
| 1 | Client IP | 192.168.1.42 |
| 2 | Ident (almost always -) | - |
| 3 | Auth user (- if none) | frank |
| 4 | Timestamp in brackets | [12/Jun/2024:13:45:22 +0000] |
| 5 | Request line in quotes | "GET /api/users HTTP/1.1" |
| 6 | Status code | 200 |
| 7 | Response bytes | 1523 |
| 8 | Referrer in quotes | "https://example.com/dashboard" |
| 9 | User agent in quotes | "Mozilla/5.0" |
Building the pattern piece by piece
Step 1 — IP address
An IPv4 address is four groups of 1–3 digits separated by dots:
const ipPattern = /(?<ip>\d{1,3}(?:\.\d{1,3}){3})/;The named group (?<ip>…) lets you access match.groups.ip later.
Step 2 — HTTP method and path
The request line is quoted: "METHOD /path HTTP/version". We want the method
and the path:
const requestPattern = /"(?<method>[A-Z]+)\s+(?<path>[^\s"]+)\s+HTTP\/[\d.]+"/;[A-Z]+ matches the verb (GET, POST, etc.). [^\s"]+ matches the path —
"anything that isn't whitespace or a quote".
Step 3 — status code and response bytes
Both are integers separated by a space:
const statusPattern = /(?<status>\d{3})\s+(?<bytes>\d+|-)/;The |- handles the - that some servers write when no bytes were transferred.
Step 4 — referrer
The referrer is quoted (and may be - inside the quotes if there is none):
const referrerPattern = /"(?<referrer>[^"]*)"/;[^"]* matches anything that isn't a quote — no risk of the pattern crossing
into the user agent field.
Step 5 — composing the full pattern
Combine the pieces, accounting for the fixed-format fields in between:
const LOG_PATTERN = new RegExp(
"(?<ip>\\d{1,3}(?:\\.\\d{1,3}){3})" + // IP
"\\s+\\S+\\s+\\S+\\s+" + // ident, auth (skip)
"\\[[^\\]]+\\]\\s+" + // timestamp (skip)
'"(?<method>[A-Z]+)\\s+' + // method
'(?<path>[^\\s"]+)\\s+HTTP\\/[\\d.]+"\\s+' + // path
"(?<status>\\d{3})\\s+" + // status
'(?<bytes>\\d+|-)\\s+' + // bytes
'"(?<referrer>[^"]*)"', // referrer
"g"
);Putting it together with matchAll
The String.prototype.matchAll method returns an iterator of all matches,
each with a groups object — perfect for processing a batch of lines:
Filtering for errors
With structured matches, filtering for 4xx and 5xx errors is straightforward:
Tips for production log parsing
- Compile once — if you process many lines in a loop, compile the regex with
new RegExp(…)or a literal outside the loop. Re-compilation on every iteration is a hidden performance cost. - Validate field widths —
\d{3}for a status code is safer than\d+because it rejects malformed lines early rather than producing a surprising match. - Handle the
-placeholder — log formats use-for missing optional fields (no referrer, no auth user). Build the|-alternative into fields that can be absent. - Test against real samples — copy 10–20 real lines from your actual log
into regex101 before deploying a parser. Edge cases (unusual user agents,
paths with spaces encoded as
%20, IPv6 addresses) can break a pattern that looks complete on toy examples.
Where to go next
The next lesson, Data extraction pipelines, applies similar techniques to unstructured product data and addresses, showing how to compose patterns into a multi-step extraction workflow.
Lab: performance and pitfalls
Practice identifying catastrophic patterns, comparing step counts, optimising greedy wildcards, and matching features to regex flavours.
Data extraction pipelines
Use regex as part of a larger pipeline — pre-processing text, extracting structured fields, normalising values, and composing sequential patterns.