Atomic groups and possessive quantifiers
Learn how atomic groups and possessive quantifiers prevent backtracking, which engines support them, and how to get the same effect in JavaScript.
- Explain what an atomic group does and how it differs from an ordinary group
- Write possessive quantifiers and identify which engines support them
- Work around the lack of atomic groups in JavaScript using a lookahead emulation
- Decide when to use these features vs simply flattening a pattern
The previous lesson showed that catastrophic backtracking happens because the engine is allowed to give back characters it has already consumed and try different ways to match them. Atomic groups and possessive quantifiers remove that permission: once matched, those characters are committed, and no amount of subsequent failure will cause the engine to backtrack into them.
What an atomic group does
An atomic group uses the syntax (?>…). Once the engine matches the content
inside, the group is "closed" — the engine will never re-enter it to try
alternatives. If the rest of the pattern fails after the atomic group succeeds,
the entire match fails at that point rather than backtracking inside the group.
Pattern (PCRE): (?>a+)b
Input: aaabThe (?>a+) greedily matches all three a characters and commits. When b
matches the fourth character, the overall match succeeds.
Pattern (PCRE): (?>a+)b
Input: aaa (no b at the end)(?>a+) commits to all three as. There is no b left, so the engine
fails — without trying to give one a back and match again. This is precisely
what prevents the exponential blow-up in patterns like (a+)+b.
Possessive quantifiers — same idea, shorter syntax
Possessive quantifiers are an alternative syntax supported in PCRE, PCRE2,
Java, and some others. Append a + to any quantifier:
| Normal | Possessive | Meaning |
|---|---|---|
a* | a*+ | Zero or more a, never give back |
a+ | a++ | One or more a, never give back |
a? | a?+ | Zero or one a, never give back |
a{3,6} | a{3,6}+ | Three to six a, never give back |
These are exactly equivalent to wrapping the corresponding part in an atomic
group. a++ is the same as (?>a+). Choose whichever reads better in context.
// Java — possessive quantifiers
Pattern fast = Pattern.compile("a++b");
// Equivalent atomic group form
Pattern alsoFast = Pattern.compile("(?>a+)b");Preventing catastrophic backtracking
The problematic pattern from the previous lesson, (a+)+b, becomes safe with
atomicity:
Dangerous: (a+)+b
Safe PCRE: (?>a+)+b or equivalently a++bWith the possessive form a++, once the a+ portion commits to some characters,
those characters cannot be redistributed. The outer + still repeats the group,
but each repetition is itself atomic — there are no exponential partitioning
paths.
Engine support matrix
Not all engines support atomic groups or possessive quantifiers:
| Engine | Atomic groups (?>…) | Possessive ++, *+ |
|---|---|---|
| PCRE / PCRE2 | Yes | Yes |
Java java.util.regex | Yes | Yes |
.NET System.Text.RegularExpressions | Yes (as (?>…)) | No |
Python re | No | No |
Python regex (third-party) | Yes | Yes |
| Ruby | Yes | No |
| JavaScript / ECMAScript | No | No |
Go regexp (RE2) | No | No |
JavaScript's built-in RegExp does not support atomic groups or possessive
quantifiers as of ES2024. The RE2 engine (used in Go and available as a
library elsewhere) guarantees linear-time matching by design — it sacrifices
features like backreferences and lookarounds rather than allowing
backtracking at all.
The JavaScript workaround
Because JavaScript lacks native atomic groups, the standard workaround is to
use a lookahead with a capturing group to "grab" a greedy match, then
reference it with \1:
// Atomic group emulation in JavaScript
// (?>a+) becomes (?=(a+))\1
const emulated = /(?=(a+))\1b/;
// Walk-through:
// (?=(a+)) — lookahead grabs as many 'a's as possible into group 1
// \1 — immediately consumes exactly that many 'a's
// b — must followThe lookahead peeks ahead without consuming, matching as many as as possible
into group 1. Then \1 consumes exactly that many characters in one shot —
no alternatives to explore. If b fails afterward, the engine cannot re-enter
the lookahead with a shorter group 1 because \1 is a fixed-length match.
The lookahead emulation is a useful technique but has limits: it only works when the inner pattern can be captured as a group and referenced literally. For complex atomic groups with alternation inside, the emulation becomes unwieldy. In those cases, restructuring the pattern (flattening quantifiers, adding anchors) is the more maintainable fix.
A practical before/after in PCRE
Consider a pattern that validates a sequence of comma-separated integers:
Dangerous (PCRE): ^(\d+,?)+$If the input is a long string of digits without a valid end (e.g.
12345678901234x), the alternation over whether each ,? matches causes
exponential retries.
Safe (PCRE possessive): ^\d++(,\d++)*+$Each \d++ commits to its digits without ever giving them back. The , is
literal, so there is no ambiguity about whether a character is a digit or a
comma.
When to use these features
Possessive quantifiers and atomic groups are most valuable when:
- You have a pattern that must run against untrusted or unbounded input — any user-supplied string is a candidate.
- You need nested or repeated groups over an overlapping character class and flattening would change the match semantics.
- Profiling or the regex debugger shows excessive backtrack steps (covered in the next lesson).
When the pattern can be simplified by flattening a nested quantifier or removing alternation overlap, prefer that approach — it works in all engines and is easier to read.
Where to go next
The next lesson covers benchmarking regex patterns — how to measure backtrack steps and execution time, and a worked walkthrough of optimising a real pattern from slow to fast.
Catastrophic backtracking
Understand how NFA regex engines backtrack, why certain patterns cause exponential slowdowns, and how to rewrite them safely.
Benchmarking regex patterns
Measure regex performance with console.time, timeit, and the regex101 debugger, then apply practical rules to optimise slow patterns.