Code of the Day
AdvancedPerformance and pitfalls

Atomic groups and possessive quantifiers

Learn how atomic groups and possessive quantifiers prevent backtracking, which engines support them, and how to get the same effect in JavaScript.

Regular ExpressionsAdvanced10 min read
Recommended first
By the end of this lesson you will be able to:
  • Explain what an atomic group does and how it differs from an ordinary group
  • Write possessive quantifiers and identify which engines support them
  • Work around the lack of atomic groups in JavaScript using a lookahead emulation
  • Decide when to use these features vs simply flattening a pattern

The previous lesson showed that catastrophic backtracking happens because the engine is allowed to give back characters it has already consumed and try different ways to match them. Atomic groups and possessive quantifiers remove that permission: once matched, those characters are committed, and no amount of subsequent failure will cause the engine to backtrack into them.

What an atomic group does

An atomic group uses the syntax (?>…). Once the engine matches the content inside, the group is "closed" — the engine will never re-enter it to try alternatives. If the rest of the pattern fails after the atomic group succeeds, the entire match fails at that point rather than backtracking inside the group.

Pattern (PCRE):  (?>a+)b
Input:           aaab

The (?>a+) greedily matches all three a characters and commits. When b matches the fourth character, the overall match succeeds.

Pattern (PCRE):  (?>a+)b
Input:           aaa    (no b at the end)

(?>a+) commits to all three as. There is no b left, so the engine fails — without trying to give one a back and match again. This is precisely what prevents the exponential blow-up in patterns like (a+)+b.

Possessive quantifiers — same idea, shorter syntax

Possessive quantifiers are an alternative syntax supported in PCRE, PCRE2, Java, and some others. Append a + to any quantifier:

NormalPossessiveMeaning
a*a*+Zero or more a, never give back
a+a++One or more a, never give back
a?a?+Zero or one a, never give back
a{3,6}a{3,6}+Three to six a, never give back

These are exactly equivalent to wrapping the corresponding part in an atomic group. a++ is the same as (?>a+). Choose whichever reads better in context.

// Java — possessive quantifiers
Pattern fast = Pattern.compile("a++b");
// Equivalent atomic group form
Pattern alsoFast = Pattern.compile("(?>a+)b");

Preventing catastrophic backtracking

The problematic pattern from the previous lesson, (a+)+b, becomes safe with atomicity:

Dangerous:  (a+)+b
Safe PCRE:  (?>a+)+b   or equivalently   a++b

With the possessive form a++, once the a+ portion commits to some characters, those characters cannot be redistributed. The outer + still repeats the group, but each repetition is itself atomic — there are no exponential partitioning paths.

Engine support matrix

Not all engines support atomic groups or possessive quantifiers:

EngineAtomic groups (?>…)Possessive ++, *+
PCRE / PCRE2YesYes
Java java.util.regexYesYes
.NET System.Text.RegularExpressionsYes (as (?>…))No
Python reNoNo
Python regex (third-party)YesYes
RubyYesNo
JavaScript / ECMAScriptNoNo
Go regexp (RE2)NoNo

JavaScript's built-in RegExp does not support atomic groups or possessive quantifiers as of ES2024. The RE2 engine (used in Go and available as a library elsewhere) guarantees linear-time matching by design — it sacrifices features like backreferences and lookarounds rather than allowing backtracking at all.

The JavaScript workaround

Because JavaScript lacks native atomic groups, the standard workaround is to use a lookahead with a capturing group to "grab" a greedy match, then reference it with \1:

// Atomic group emulation in JavaScript
// (?>a+)  becomes  (?=(a+))\1
const emulated = /(?=(a+))\1b/;

// Walk-through:
// (?=(a+))  — lookahead grabs as many 'a's as possible into group 1
// \1        — immediately consumes exactly that many 'a's
// b         — must follow

The lookahead peeks ahead without consuming, matching as many as as possible into group 1. Then \1 consumes exactly that many characters in one shot — no alternatives to explore. If b fails afterward, the engine cannot re-enter the lookahead with a shorter group 1 because \1 is a fixed-length match.

The lookahead emulation is a useful technique but has limits: it only works when the inner pattern can be captured as a group and referenced literally. For complex atomic groups with alternation inside, the emulation becomes unwieldy. In those cases, restructuring the pattern (flattening quantifiers, adding anchors) is the more maintainable fix.

A practical before/after in PCRE

Consider a pattern that validates a sequence of comma-separated integers:

Dangerous (PCRE): ^(\d+,?)+$

If the input is a long string of digits without a valid end (e.g. 12345678901234x), the alternation over whether each ,? matches causes exponential retries.

Safe (PCRE possessive): ^\d++(,\d++)*+$

Each \d++ commits to its digits without ever giving them back. The , is literal, so there is no ambiguity about whether a character is a digit or a comma.

JavaScript — editable, runs in your browser

When to use these features

Possessive quantifiers and atomic groups are most valuable when:

  1. You have a pattern that must run against untrusted or unbounded input — any user-supplied string is a candidate.
  2. You need nested or repeated groups over an overlapping character class and flattening would change the match semantics.
  3. Profiling or the regex debugger shows excessive backtrack steps (covered in the next lesson).

When the pattern can be simplified by flattening a nested quantifier or removing alternation overlap, prefer that approach — it works in all engines and is easier to read.

Where to go next

The next lesson covers benchmarking regex patterns — how to measure backtrack steps and execution time, and a worked walkthrough of optimising a real pattern from slow to fast.

Finished reading? Mark it complete to track your progress.

On this page