Code of the Day
AdvancedPerformance and pitfalls

Catastrophic backtracking

Understand how NFA regex engines backtrack, why certain patterns cause exponential slowdowns, and how to rewrite them safely.

Regular ExpressionsAdvanced12 min read
Recommended first
By the end of this lesson you will be able to:
  • Explain NFA backtracking and why it can reach exponential time complexity
  • Recognise the structural patterns that cause catastrophic backtracking
  • Describe what ReDoS is and cite the category of real-world impact it has caused
  • Rewrite a dangerous pattern to eliminate the exponential blow-up

Every regex engine you encounter in a browser, interpreter, or server runtime is almost certainly an NFA (non-deterministic finite automaton) engine. NFAs are powerful — they support backreferences, lookarounds, and other features — but they pay for that power with backtracking. Understand backtracking and you can write patterns that stay fast even on adversarial input.

How NFA backtracking works

When an NFA engine tries to match a pattern, it picks a path through the automaton and follows it. If that path fails, it backtracks to the last decision point and tries an alternative. In the average case this is cheap — a few backtracks on a successful match. In the worst case it becomes a complete search over all possible paths, and the number of paths can be exponential in the length of the input.

Consider matching the simple pattern /a?a?a?aaa/ against the string "aa". The a? sections can each match or not. After the optional parts exhaust all combinations without finding the required triple aaa, the engine has tried every combination of three bits — 8 attempts total. For n optional parts and no satisfying match, the engine tries 2^n paths.

In practice, catastrophic patterns involve nested or adjacent quantifiers over overlapping character sets, not just a list of optional characters.

The classic dangerous structure

The most dangerous structure is a nested quantifier like (a+)+:

Pattern:  (a+)+b
Input:    aaaaaaaaac   (long run of a, ending in c — no match)

The outer + tries to divide the a characters among multiple repetitions of the inner group (a+). There are exponentially many ways to partition n characters: {a,a,a,…} can be split as [1+n-1], [2+n-2], [1+1+n-2], and so on. Because the final b never matches c, the engine exhausts all partitions before admitting defeat.

// DO NOT run this in a browser — it will hang the tab
// Shown here for illustration only
const bad = /(a+)+b/;
bad.test("aaaaaaaaaaaaaaaaaaaaac");   // may take seconds or never return

ReDoS — Regex Denial of Service. Patterns with catastrophic backtracking can be triggered intentionally by an attacker who sends carefully crafted input. This has caused real production outages. Cloudflare suffered a global outage in 2019 partly due to a backtracking regex in a WAF rule. The npm registry, Node.js itself, and several popular open-source libraries have each had ReDoS CVEs. Any regex that runs against user-supplied input in a latency-sensitive path is a potential target.

Structural signatures of dangerous patterns

Once you know what to look for, bad patterns are recognisable at a glance:

StructureExampleWhy it is dangerous
Nested quantifier(a+)+Exponential partitioning of input
Alternation inside a quantifier where branches overlap(a|aa)+Engine tries both branches at each position
Repeated alternation with shared prefix(ab|a)+ab fails, backtrack to a, try rest, repeat
Mutual-overlap quantifier paira*a* followed by a required literalTwo greedy quantifiers share the same character class

The common thread: the engine can satisfy the quantified section in many different ways while consuming the same characters. It tries all of them before accepting a non-match.

Recognising the alternation variant

(a|aa)+ is equally dangerous:

Pattern:  (a|aa)+b
Input:    aaaaaaaaac

At each position the engine can match one a or two as, then try again. After all combinations fail to produce b, the engine has tried every binary sequence of 1/2 choices — again exponential.

The tell: branches of the alternation are prefixes of each other, so the engine cannot eliminate one branch early.

How to rewrite dangerous patterns

1. Flatten nested quantifiers

Replace (a+)+ with a+ — if all you need is one or more as anywhere, a single quantifier is sufficient:

// Dangerous
const bad  = /(a+)+b/;

// Safe: flatten to a single quantifier
const good = /a+b/;

2. Remove alternation overlap by being explicit

If the intent of (a|aa)+ is "match any sequence of one or more a characters", it is identical to a+. If the intent is different (groups of exactly 1 or 2), capture that directly:

// Dangerous: overlapping alternation
const bad  = /(a|aa)+/;

// Safe: if you want 1-or-2 a's per group
const good = /(?:a{1,2})+/;
// Safe: if you just want one or more a's
const good2 = /a+/;

3. Use possessive quantifiers or atomic groups (where supported)

The engine backtracks because it allows previously consumed characters to be given back. Possessive quantifiers and atomic groups prevent that. They are covered in the next lesson; for now, know that they exist as a solution when flattening is not possible.

4. Anchor aggressively

Adding ^ and $ to patterns that match whole strings means the engine fails faster when it cannot match:

// Without anchors: engine tries at every position in a long string
const unanchored = /(a+)+b/;

// With anchors: fails at the first character if not 'a'
const anchored = /^(a+)+b$/;

Anchoring alone does not cure the exponential blow-up in the worst case, but it eliminates retries at each starting position and reduces constants significantly.

A concrete before/after

Consider a naive email-like validator:

// Bad: double quantifier — [a-zA-Z0-9.] and the outer + can overlap badly
const naive = /^([a-zA-Z0-9.]+)+@[a-zA-Z0-9]+\.[a-zA-Z]{2,}$/;

// Safer: flatten and be specific — no nested repetition
const safer = /^[a-zA-Z0-9.]+@[a-zA-Z0-9]+\.[a-zA-Z]{2,}$/;

The ([a-zA-Z0-9.]+)+ wraps the repeated character class in an outer +. An attacker could send aaaaaaa@ (many characters, no valid domain) and the engine would churn. Removing the outer group produces identical match semantics and removes the explosion.

JavaScript — editable, runs in your browser

Where to go next

The next lesson covers atomic groups and possessive quantifiers — the mechanisms that prevent backtracking at the engine level, letting you express complex patterns safely even when flattening is not straightforward.

Finished reading? Mark it complete to track your progress.

On this page