Lab: performance and pitfalls
Practice identifying catastrophic patterns, comparing step counts, optimising greedy wildcards, and matching features to regex flavours.
- Identify why a given pattern is catastrophic and produce a safe rewrite
- Compare two patterns using a step-count proxy and explain the difference
- Replace a greedy wildcard with a more specific alternative
- Given a feature list, identify which regex flavour supports all of them
Optional lab. These exercises consolidate everything from the Performance and pitfalls module. Work through each checkpoint, read the solution explanation, and try your own variations.
Warm up — see the step count differ
Run both patterns below against the same non-matching input. Notice how the "dangerous" pattern does far more work even though both ultimately fail.
Checkpoint 1 — identify and rewrite a catastrophic pattern
The function below uses a dangerous pattern for a "verify all fields are non-empty" check. Rewrite it so it has no catastrophic backtracking risk.
The pattern /(\w+\s*)+!/ is catastrophic on non-matching input because \w+ and \s* can divide characters in exponentially many ways. Rewrite validateInput(s) to return true if s ends with '!' and contains only word characters and spaces — but without nested quantifiers. Use /^[\w\s]+!$/ or similar.
validateInput('hello world!') → truevalidateInput('hello world') → falseThe safe rewrite /^[\w\s]+!$/ is equivalent in what it accepts: one or more
word characters or spaces, ending with !. The key change is that [\w\s]+ is
a single quantifier over a character class — there is no inner quantifier to
create exponential partitioning. The ^ and $ anchors also fail the pattern
early when the string doesn't start or end correctly.
Checkpoint 2 — compare two patterns by step count
Use regex101.com (open in a new tab) to compare these
two patterns against the test string username-admin@company.org with flavour
set to JavaScript:
- Pattern A:
^([\w.-]+)+@[\w-]+\.[\w.]+$ - Pattern B:
^[\w.-]+@[\w-]+\.[\w.]+$
Note the step count shown below the match information for each. Then answer the exercise below:
Both patterns match valid email-like strings. Write saferEmail(s) that uses Pattern B (the version without the outer grouping quantifier) to return true if s looks like an email (user@domain.tld format). Implement it so the tests pass.
saferEmail('user@example.com') → truesaferEmail('not-an-email') → falseCheckpoint 3 — replace .* with a specific character class
The pattern below extracts everything between two <tag> markers. It uses .*
which scans to the end of the string before backtracking. Replace it with a
more specific alternative that fails faster.
Write extractBetweenTags(s, tag) that returns the text between <tag> and </tag> in s. Do NOT use .* — use [^<]* instead (anything that isn't a < can appear between tags in this simplified case). Return null if no match.
extractBetweenTags('<b>hello</b>', 'b') → 'hello'extractBetweenTags('no tags here', 'b') → null[^<]* is called a negated character class quantifier. It matches
"everything that is not a <" — which is exactly the constraint we need
between HTML tags. This is typically one of the highest-impact single
substitutions you can make to a slow pattern: replace .* or .+ with a
negated class that reflects what is actually legal between the delimiters.
Checkpoint 4 — match a feature list to a flavour
Given a list of regex features needed by a project, identify which flavour supports all of them.
Write requiresPCRE(features) that returns true if the features array contains any of: 'atomic-groups', 'possessive-quantifiers', or 'recursive-patterns' — features that require PCRE/PCRE2 and are NOT available in JavaScript or RE2.
requiresPCRE(['atomic-groups', 'named-groups']) → truerequiresPCRE(['named-groups', 'lookahead']) → falseDone?
You have now worked through the full Performance and pitfalls module. The next module, Real-world applications, moves from pathology to practice: parsing log files, extracting structured data, using regex in tooling, and knowing when to reach for a parser instead.
Engine differences
Compare the major regex flavours — PCRE2, RE2, Java, .NET, JavaScript, and Python — across features, performance guarantees, and syntax variations.
Parsing log files
Build a complete Apache/Nginx access log parser step by step using named groups, then extract multiple fields with matchAll.