Catastrophic backtracking
Understand how NFA regex engines backtrack, why certain patterns cause exponential slowdowns, and how to rewrite them safely.
- Explain NFA backtracking and why it can reach exponential time complexity
- Recognise the structural patterns that cause catastrophic backtracking
- Describe what ReDoS is and cite the category of real-world impact it has caused
- Rewrite a dangerous pattern to eliminate the exponential blow-up
Every regex engine you encounter in a browser, interpreter, or server runtime is almost certainly an NFA (non-deterministic finite automaton) engine. NFAs are powerful — they support backreferences, lookarounds, and other features — but they pay for that power with backtracking. Understand backtracking and you can write patterns that stay fast even on adversarial input.
How NFA backtracking works
When an NFA engine tries to match a pattern, it picks a path through the automaton and follows it. If that path fails, it backtracks to the last decision point and tries an alternative. In the average case this is cheap — a few backtracks on a successful match. In the worst case it becomes a complete search over all possible paths, and the number of paths can be exponential in the length of the input.
Consider matching the simple pattern /a?a?a?aaa/ against the string "aa".
The a? sections can each match or not. After the optional parts exhaust all
combinations without finding the required triple aaa, the engine has tried
every combination of three bits — 8 attempts total. For n optional parts and
no satisfying match, the engine tries 2^n paths.
In practice, catastrophic patterns involve nested or adjacent quantifiers over overlapping character sets, not just a list of optional characters.
The classic dangerous structure
The most dangerous structure is a nested quantifier like (a+)+:
Pattern: (a+)+b
Input: aaaaaaaaac (long run of a, ending in c — no match)The outer + tries to divide the a characters among multiple repetitions of
the inner group (a+). There are exponentially many ways to partition n
characters: {a,a,a,…} can be split as [1+n-1], [2+n-2], [1+1+n-2], and so
on. Because the final b never matches c, the engine exhausts all
partitions before admitting defeat.
// DO NOT run this in a browser — it will hang the tab
// Shown here for illustration only
const bad = /(a+)+b/;
bad.test("aaaaaaaaaaaaaaaaaaaaac"); // may take seconds or never returnReDoS — Regex Denial of Service. Patterns with catastrophic backtracking can be triggered intentionally by an attacker who sends carefully crafted input. This has caused real production outages. Cloudflare suffered a global outage in 2019 partly due to a backtracking regex in a WAF rule. The npm registry, Node.js itself, and several popular open-source libraries have each had ReDoS CVEs. Any regex that runs against user-supplied input in a latency-sensitive path is a potential target.
Structural signatures of dangerous patterns
Once you know what to look for, bad patterns are recognisable at a glance:
| Structure | Example | Why it is dangerous |
|---|---|---|
| Nested quantifier | (a+)+ | Exponential partitioning of input |
| Alternation inside a quantifier where branches overlap | (a|aa)+ | Engine tries both branches at each position |
| Repeated alternation with shared prefix | (ab|a)+ | ab fails, backtrack to a, try rest, repeat |
| Mutual-overlap quantifier pair | a*a* followed by a required literal | Two greedy quantifiers share the same character class |
The common thread: the engine can satisfy the quantified section in many different ways while consuming the same characters. It tries all of them before accepting a non-match.
Recognising the alternation variant
(a|aa)+ is equally dangerous:
Pattern: (a|aa)+b
Input: aaaaaaaaacAt each position the engine can match one a or two as, then try again.
After all combinations fail to produce b, the engine has tried every
binary sequence of 1/2 choices — again exponential.
The tell: branches of the alternation are prefixes of each other, so the engine cannot eliminate one branch early.
How to rewrite dangerous patterns
1. Flatten nested quantifiers
Replace (a+)+ with a+ — if all you need is one or more as anywhere, a
single quantifier is sufficient:
// Dangerous
const bad = /(a+)+b/;
// Safe: flatten to a single quantifier
const good = /a+b/;2. Remove alternation overlap by being explicit
If the intent of (a|aa)+ is "match any sequence of one or more a characters",
it is identical to a+. If the intent is different (groups of exactly 1 or 2),
capture that directly:
// Dangerous: overlapping alternation
const bad = /(a|aa)+/;
// Safe: if you want 1-or-2 a's per group
const good = /(?:a{1,2})+/;
// Safe: if you just want one or more a's
const good2 = /a+/;3. Use possessive quantifiers or atomic groups (where supported)
The engine backtracks because it allows previously consumed characters to be given back. Possessive quantifiers and atomic groups prevent that. They are covered in the next lesson; for now, know that they exist as a solution when flattening is not possible.
4. Anchor aggressively
Adding ^ and $ to patterns that match whole strings means the engine fails
faster when it cannot match:
// Without anchors: engine tries at every position in a long string
const unanchored = /(a+)+b/;
// With anchors: fails at the first character if not 'a'
const anchored = /^(a+)+b$/;Anchoring alone does not cure the exponential blow-up in the worst case, but it eliminates retries at each starting position and reduces constants significantly.
A concrete before/after
Consider a naive email-like validator:
// Bad: double quantifier — [a-zA-Z0-9.] and the outer + can overlap badly
const naive = /^([a-zA-Z0-9.]+)+@[a-zA-Z0-9]+\.[a-zA-Z]{2,}$/;
// Safer: flatten and be specific — no nested repetition
const safer = /^[a-zA-Z0-9.]+@[a-zA-Z0-9]+\.[a-zA-Z]{2,}$/;The ([a-zA-Z0-9.]+)+ wraps the repeated character class in an outer +. An
attacker could send aaaaaaa@ (many characters, no valid domain) and the engine
would churn. Removing the outer group produces identical match semantics and
removes the explosion.
Where to go next
The next lesson covers atomic groups and possessive quantifiers — the mechanisms that prevent backtracking at the engine level, letting you express complex patterns safely even when flattening is not straightforward.
Lab: lookarounds
Practice positive and negative lookahead and lookbehind on realistic strings — from config parsing to content filtering.
Atomic groups and possessive quantifiers
Learn how atomic groups and possessive quantifiers prevent backtracking, which engines support them, and how to get the same effect in JavaScript.