Code of the Day
AdvancedReal-world applications

Parsing log files

Build a complete Apache/Nginx access log parser step by step using named groups, then extract multiple fields with matchAll.

Regular ExpressionsAdvanced12 min read
By the end of this lesson you will be able to:
  • Read and understand the structure of a standard access log line
  • Write named-group patterns for IP address, HTTP method, path, status code, and response bytes
  • Compose those patterns into a single combined parser
  • Use matchAll to process multiple log lines and collect structured results

Log files are one of the most common places developers reach for regex. A single day of web traffic can produce millions of lines, each in a well-known but not-quite-CSV format. Parsing them with split is fragile; a dedicated log parser is heavy. A well-crafted regex with named groups threads the needle: precise, readable, and fast enough for batch processing.

The access log format

A standard Apache/Nginx combined log line looks like this:

192.168.1.42 - frank [12/Jun/2024:13:45:22 +0000] "GET /api/users HTTP/1.1" 200 1523 "https://example.com/dashboard" "Mozilla/5.0"

The fields in order:

PositionFieldExample
1Client IP192.168.1.42
2Ident (almost always -)-
3Auth user (- if none)frank
4Timestamp in brackets[12/Jun/2024:13:45:22 +0000]
5Request line in quotes"GET /api/users HTTP/1.1"
6Status code200
7Response bytes1523
8Referrer in quotes"https://example.com/dashboard"
9User agent in quotes"Mozilla/5.0"

Building the pattern piece by piece

Step 1 — IP address

An IPv4 address is four groups of 1–3 digits separated by dots:

const ipPattern = /(?<ip>\d{1,3}(?:\.\d{1,3}){3})/;

The named group (?<ip>…) lets you access match.groups.ip later.

Step 2 — HTTP method and path

The request line is quoted: "METHOD /path HTTP/version". We want the method and the path:

const requestPattern = /"(?<method>[A-Z]+)\s+(?<path>[^\s"]+)\s+HTTP\/[\d.]+"/;

[A-Z]+ matches the verb (GET, POST, etc.). [^\s"]+ matches the path — "anything that isn't whitespace or a quote".

Step 3 — status code and response bytes

Both are integers separated by a space:

const statusPattern = /(?<status>\d{3})\s+(?<bytes>\d+|-)/;

The |- handles the - that some servers write when no bytes were transferred.

Step 4 — referrer

The referrer is quoted (and may be - inside the quotes if there is none):

const referrerPattern = /"(?<referrer>[^"]*)"/;

[^"]* matches anything that isn't a quote — no risk of the pattern crossing into the user agent field.

Step 5 — composing the full pattern

Combine the pieces, accounting for the fixed-format fields in between:

const LOG_PATTERN = new RegExp(
  "(?<ip>\\d{1,3}(?:\\.\\d{1,3}){3})" +  // IP
  "\\s+\\S+\\s+\\S+\\s+" +               // ident, auth (skip)
  "\\[[^\\]]+\\]\\s+" +                   // timestamp (skip)
  '"(?<method>[A-Z]+)\\s+' +             // method
  '(?<path>[^\\s"]+)\\s+HTTP\\/[\\d.]+"\\s+' + // path
  "(?<status>\\d{3})\\s+" +              // status
  '(?<bytes>\\d+|-)\\s+' +              // bytes
  '"(?<referrer>[^"]*)"',               // referrer
  "g"
);

Putting it together with matchAll

The String.prototype.matchAll method returns an iterator of all matches, each with a groups object — perfect for processing a batch of lines:

JavaScript — editable, runs in your browser

Filtering for errors

With structured matches, filtering for 4xx and 5xx errors is straightforward:

JavaScript — editable, runs in your browser

Tips for production log parsing

  • Compile once — if you process many lines in a loop, compile the regex with new RegExp(…) or a literal outside the loop. Re-compilation on every iteration is a hidden performance cost.
  • Validate field widths\d{3} for a status code is safer than \d+ because it rejects malformed lines early rather than producing a surprising match.
  • Handle the - placeholder — log formats use - for missing optional fields (no referrer, no auth user). Build the |- alternative into fields that can be absent.
  • Test against real samples — copy 10–20 real lines from your actual log into regex101 before deploying a parser. Edge cases (unusual user agents, paths with spaces encoded as %20, IPv6 addresses) can break a pattern that looks complete on toy examples.

Where to go next

The next lesson, Data extraction pipelines, applies similar techniques to unstructured product data and addresses, showing how to compose patterns into a multi-step extraction workflow.

Finished reading? Mark it complete to track your progress.

On this page