Code of the Day
IntermediateGroups and references

Backreferences

Reference a previously captured group inside the same pattern to match repeated or mirrored text, and use group references in replacement strings.

Regular ExpressionsIntermediate9 min read
Recommended first
By the end of this lesson you will be able to:
  • Write a backreference with \1 or \k<name> to match a repeated captured value
  • Use backreferences in a replacement string with $1 or \g<name>
  • Identify practical use cases such as finding duplicate words

A backreference lets you reuse the actual text matched by a capturing group later in the same pattern. Instead of matching a fixed string, it matches whatever group N happened to capture. This enables patterns that are impossible with character classes and quantifiers alone.

Backreferences in patterns

Inside a regex pattern, \1 refers to the text matched by group 1, \2 to group 2, and so on:

// Find repeated characters: "aa", "bb", "cc"…
/(.)\1/.test("aardvark");   // true  — "aa"
/(.)\1/.test("hello");      // false — no adjacent repeated char
/(.)\1/g.exec("bookkeeper")[0]; // "oo"

The (.) captures any single character, and \1 requires the same character to appear again immediately after.

The classic duplicate-word pattern

const dupeWord = /\b(\w+)\s+\1\b/i;
dupeWord.test("the the quick");     // true  — "the the"
dupeWord.test("the quick brown");   // false — no duplicate
dupeWord.test("It is is wrong");    // true  — "is is"

\b(\w+) captures a whole word, \s+ allows one or more spaces between, then \1 requires the exact same word again. The i flag makes it case-insensitive so "The the" is also caught.

Backreferences and HTML-like patterns

A classic use case: match a simple opening and closing tag where the tag names must match:

/<(\w+)>.*?<\/\1>/s.test("<p>Hello</p>");    // true
/<(\w+)>.*?<\/\1>/s.test("<p>Hello</div>");  // false — tag names differ
/<(\w+)>.*?<\/\1>/s.test("<h2>Title</h2>"); // true

Group 1 captures the opening tag name. \1 in the closing tag requires the exact same name.

This trick works for simple, non-nested tags. As soon as tags nest — <div><p> text</p></div> — the backreference approach breaks down. Real HTML parsing requires a proper parser.

Named backreferences: \k<name>

When using named groups, reference them with \k<name>:

const dupeNamed = /\b(?<word>\w+)\s+\k<word>\b/i;
dupeNamed.test("the the quick");  // true

Named backreferences are more readable when the pattern is complex and group numbering is hard to track.

In Python: (?P=name) is the syntax for a named backreference:

import re
re.search(r"\b(?P<word>\w+)\s+(?P=word)\b", "the the quick", re.I)

Backreferences in replacement strings

In String.prototype.replace, backreferences in the replacement string let you reorder or repeat captured content:

// Swap first and last name
"Smith, John".replace(/(\w+), (\w+)/, "$2 $1");
// "John Smith"

// Surround matched words with emphasis
"hello world".replace(/(\w+)/g, "**$1**");
// "**hello** **world**"

// Reformat ISO date to US format
"2024-03-15".replace(/(\d{4})-(\d{2})-(\d{2})/, "$2/$3/$1");
// "03/15/2024"

With named groups, use $<name> in the replacement:

"2024-03-15".replace(
  /(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/,
  "$<m>/$<d>/$<y>"
);
// "03/15/2024"

In Python, replacements use \1 or \g<1> (numbered) and \g<name> (named):

import re
re.sub(r"(\w+), (\w+)", r"\2 \1", "Smith, John")
# "John Smith"

Backreferences and repeated structure

Backreferences can enforce structural symmetry in patterns:

// Match strings surrounded by the same delimiter (' or ")
const quoted = /^(['"]).*\1$/;
quoted.test('"hello"');   // true  — both double quotes
quoted.test("'hello'");   // true  — both single quotes
quoted.test('"hello\'");  // false — mismatched quotes
JavaScript — editable, runs in your browser

Where to go next

Next: the Groups lab — apply capturing groups, named groups, non-capturing groups, and backreferences to realistic text extraction and transformation tasks.

Finished reading? Mark it complete to track your progress.

On this page