Code of the Day
AdvancedReal-world applications

Lab: real-world applications

Four exercises on realistic messy text — log error extraction, URL parsing, price normalisation, and pattern decomposition.

Lab · optionalRegular ExpressionsAdvanced35 min
Recommended first
By the end of this lesson you will be able to:
  • Extract 4xx errors with timestamps and paths from a server log
  • Parse href URLs from an HTML snippet using regex (and explain the parser alternative)
  • Normalise price strings to floats using a pipeline approach
  • Decompose a complex single pattern into two sequential simpler ones

Optional lab. These exercises work with the kind of messy, inconsistent real-world text that automated agents and scripts encounter every day. Each checkpoint includes a solution and explanation. Try to write the solution yourself before reading ahead.

Warm up — anatomy of a log line

Before writing any patterns, explore what the data looks like:

JavaScript — editable, runs in your browser

Checkpoint 1 — extract 4xx errors

Extract all lines with 4xx status codes. For each, return an object with { timestamp, method, path, status }. The timestamp is the content inside […].

Extract 4xx errors from logJavaScript

Write extractErrors(logs) that takes an array of Apache log line strings and returns an array of objects { timestamp, method, path, status } for every line with a 4xx status code (400-499).

extractErrors([...]) on a 401 line[{ timestamp: '12/Jun/2024:08:00:02 +0000', method: 'POST', path: '/login', status: '401' }]

Checkpoint 2 — extract href URLs from HTML

Extract all href attribute values from anchor tags in an HTML snippet. Then explain (in a comment) why you would use a real parser in production.

Extract href values from HTMLJavaScript

Write extractHrefs(html) that returns an array of all href attribute values found in anchor tags. Handle both double-quoted and single-quoted values. Return an empty array if none found. (This works for simple inputs — the next step in production is a real HTML parser.)

extractHrefs('<a href="https://example.com">link</a>')['https://example.com']

The exercise above works for simple, well-formed HTML. In production, it will miss: href values with spaces before =, attributes on multiple lines, values containing escaped quotes, and links inside HTML comments. For any real HTML document, use DOMParser (browser) or cheerio/node-html-parser (Node.js). The regex version is acceptable for controlled, machine-generated output where you own the format.

Checkpoint 3 — price normalisation pipeline

Extract all price-like strings from a product catalogue and normalise them to floats. Prices appear in formats like $12.99, $7.50, $24, 12.99 USD.

Extract and normalise pricesJavaScript

Write extractPrices(text) that returns an array of numbers (as JavaScript floats) for every price-like value in text. Prices are: a dollar sign followed by digits and optional decimal ($12.99, $7, $0.50), OR digits with optional decimal followed by ' USD' (12.99 USD, 7 USD). Return results in order of appearance.

extractPrices('Widget $12.99 each')[12.99]extractPrices('$3.99 and 5 USD')[3.99, 5]

Checkpoint 4 — decompose a complex pattern

The function below uses one large pattern to extract a user:password pair from a connection string. Rewrite it as two sequential simpler patterns and explain the trade-off in a comment.

Decompose a complex patternJavaScript

Write parseCredentials(connStr) that extracts { user, password } from a connection string like 'postgres://alice:s3cr3t@db.host:5432/mydb'. Use two separate patterns: one to extract the user:password section, then another to split it into user and password. Return null if the format doesn't match.

parseCredentials('postgres://alice:s3cr3t@db.host:5432/mydb'){ user: 'alice', password: 's3cr3t' }parseCredentials('not-a-url')null

Notice that the two-step approach handles a password containing a : correctly (Checkpoint 4, second test). A single combined pattern would need to use a greedy-vs-lazy trick or a more complex character class to handle this edge case. The sequential approach makes the intent of each step obvious: "find the credential block" and "split it at the first colon".

Done?

All four green? You have completed the full Regular Expressions advanced tier.

You now have the tools to:

  • Diagnose and fix catastrophic backtracking
  • Use possessive quantifiers and atomic groups (and emulate them in JavaScript)
  • Benchmark patterns and read step counts in regex debuggers
  • Navigate engine differences and choose the right flavour for each environment
  • Build multi-step extraction pipelines from log files and unstructured text
  • Use regex across the developer toolchain — grep, sed, VS Code, git, PostgreSQL
  • Recognise when to stop and reach for a parser instead

The next practice ground is your own work: log files, data migrations, search features, and linter configs. Real text is always messier than examples — but now you know how to read it.

Finished reading? Mark it complete to track your progress.

On this page