Lab: performance and pitfalls

Practice identifying catastrophic patterns, comparing step counts, optimising greedy wildcards, and matching features to regex flavours.

Optional lab. These exercises consolidate everything from the Performance and pitfalls module. Work through each checkpoint, read the solution explanation, and try your own variations.

Warm up — see the step count differ

Run both patterns below against the same non-matching input. Notice how the "dangerous" pattern does far more work even though both ultimately fail.

JavaScript — editable, runs in your browser

Checkpoint 1 — identify and rewrite a catastrophic pattern

The function below uses a dangerous pattern for a "verify all fields are non-empty" check. Rewrite it so it has no catastrophic backtracking risk.

Rewrite a catastrophic patternJavaScript

The pattern /(\w+\s*)+!/ is catastrophic on non-matching input because \w+ and \s* can divide characters in exponentially many ways. Rewrite validateInput(s) to return true if s ends with '!' and contains only word characters and spaces — but without nested quantifiers. Use /^[\w\s]+!$/ or similar.

validateInput('hello world!') → truevalidateInput('hello world') → false

The safe rewrite /^[\w\s]+!$/ is equivalent in what it accepts: one or more word characters or spaces, ending with !. The key change is that [\w\s]+ is a single quantifier over a character class — there is no inner quantifier to create exponential partitioning. The ^ and $ anchors also fail the pattern early when the string doesn't start or end correctly.

Checkpoint 2 — compare two patterns by step count

Use regex101.com (open in a new tab) to compare these two patterns against the test string username-admin@company.org with flavour set to JavaScript:

Pattern A: ^([\w.-]+)+@[\w-]+\.[\w.]+$
Pattern B: ^[\w.-]+@[\w-]+\.[\w.]+$

Note the step count shown below the match information for each. Then answer the exercise below:

Identify the safer patternJavaScript

Both patterns match valid email-like strings. Write saferEmail(s) that uses Pattern B (the version without the outer grouping quantifier) to return true if s looks like an email (user@domain.tld format). Implement it so the tests pass.

saferEmail('user@example.com') → truesaferEmail('not-an-email') → false

Checkpoint 3 — replace `.*` with a specific character class

The pattern below extracts everything between two <tag> markers. It uses .* which scans to the end of the string before backtracking. Replace it with a more specific alternative that fails faster.

Replace .* with a specific classJavaScript

Write extractBetweenTags(s, tag) that returns the text between <tag> and </tag> in s. Do NOT use .* — use [^<]* instead (anything that isn't a < can appear between tags in this simplified case). Return null if no match.

extractBetweenTags('<b>hello</b>', 'b') → 'hello'extractBetweenTags('no tags here', 'b') → null

[^<]* is called a negated character class quantifier. It matches "everything that is not a <" — which is exactly the constraint we need between HTML tags. This is typically one of the highest-impact single substitutions you can make to a slow pattern: replace .* or .+ with a negated class that reflects what is actually legal between the delimiters.

Checkpoint 4 — match a feature list to a flavour

Given a list of regex features needed by a project, identify which flavour supports all of them.

Identify the required flavourJavaScript

Write requiresPCRE(features) that returns true if the features array contains any of: 'atomic-groups', 'possessive-quantifiers', or 'recursive-patterns' — features that require PCRE/PCRE2 and are NOT available in JavaScript or RE2.

requiresPCRE(['atomic-groups', 'named-groups']) → truerequiresPCRE(['named-groups', 'lookahead']) → false

Done?

You have now worked through the full Performance and pitfalls module. The next module, Real-world applications, moves from pathology to practice: parsing log files, extracting structured data, using regex in tooling, and knowing when to reach for a parser instead.

Finished reading? Mark it complete to track your progress.