Code of the Day
IntermediateGroups and references

Capturing groups

Use parentheses to capture matched text for extraction and reuse — the fundamental mechanism behind regex-based data extraction.

Regular ExpressionsIntermediate10 min read
By the end of this lesson you will be able to:
  • Explain the difference between matching and capturing
  • Access captured group values from a match result
  • Work with multiple groups and understand their numbering
  • Distinguish the full match from captured sub-matches

In the beginner tier you used parentheses to group patterns so that quantifiers could apply to a sequence: /(ab)+/. But grouping has a second effect: the text matched by each (…) pair is captured — saved so you can retrieve it separately from the full match.

Capturing transforms regex from a yes/no tool into an extraction tool.

The match result object

In JavaScript, String.prototype.match() (without the g flag) returns an array where:

  • Index 0 — the full match
  • Index 1, 2, … — the captured groups in left-to-right order
const result = "2024-03-15".match(/(\d{4})-(\d{2})-(\d{2})/);
console.log(result[0]);  // "2024-03-15"  — full match
console.log(result[1]);  // "2024"        — group 1
console.log(result[2]);  // "03"          — group 2
console.log(result[3]);  // "15"          — group 3

You can destructure this directly:

const [, year, month, day] = "2024-03-15".match(/(\d{4})-(\d{2})-(\d{2})/);
console.log(year, month, day); // "2024" "03" "15"

The leading comma skips index 0 (the full match).

Groups are numbered left to right by opening paren

When groups are nested or sequential, they are numbered by the position of their opening parenthesis:

const m = "abc".match(/(a(b))(c)/);
// Group 1: (a(b)) → "ab"
// Group 2: (b)    → "b"
// Group 3: (c)    → "c"
console.log(m[1], m[2], m[3]); // "ab" "b" "c"

Nesting groups like this is uncommon in practice, but understanding the numbering rule prevents confusion when reading other people's patterns.

Multiple groups — a practical example

Extract fields from a log line:

const line = "[2024-03-15 08:42] ERROR: disk quota exceeded";
const m = line.match(/\[(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2})\] (\w+): (.+)/);

if (m) {
  const [, date, time, level, message] = m;
  console.log({ date, time, level, message });
  // { date: "2024-03-15", time: "08:42", level: "ERROR",
  //   message: "disk quota exceeded" }
}

A single pattern can shred a structured line into its components in one pass.

The g flag removes group data

With the global g flag, .match() returns a flat array of full matches — group data is discarded:

"2024-03-15 and 2024-03-16".match(/(\d{4})-(\d{2})-(\d{2})/g);
// ["2024-03-15", "2024-03-16"] — full matches only, no groups

To get both all matches and group data, use matchAll:

const re = /(\d{4})-(\d{2})-(\d{2})/g;
for (const m of "2024-03-15 and 2024-03-16".matchAll(re)) {
  console.log(m[1], m[2], m[3]); // year, month, day for each
}

String.prototype.matchAll() was introduced in ES2020. It returns an iterator of full match objects (each with group data at indices 1, 2, …), which is almost always what you want when processing multiple matches with groups.

Groups in Python

Python's re.search() and re.match() return a match object with a .group() method:

import re
m = re.search(r"(\d{4})-(\d{2})-(\d{2})", "2024-03-15")
m.group(0)   # "2024-03-15" — full match (same as m.group())
m.group(1)   # "2024"
m.group(2)   # "03"
m.group(3)   # "15"

For all matches with groups, use re.finditer:

for m in re.finditer(r"(\d{4})-(\d{2})-(\d{2})", "2024-03-15 and 2024-03-16"):
    print(m.group(1), m.group(2), m.group(3))

Optional groups and undefined

If a group is inside an optional part and does not participate in the match, its slot in the result is undefined (JavaScript) or None (Python):

const m1 = "12345".match(/(\d{5})(-(\d{4}))?/);
console.log(m1[1]);  // "12345"
console.log(m1[2]);  // undefined — optional group didn't match
console.log(m1[3]);  // undefined — inner group didn't match either

const m2 = "12345-6789".match(/(\d{5})(-(\d{4}))?/);
console.log(m2[1]);  // "12345"
console.log(m2[2]);  // "-6789"
console.log(m2[3]);  // "6789"

Always check for undefined/null before using an optional group's value.

JavaScript — editable, runs in your browser

Where to go next

Next: named groups — giving groups readable names like (?<year>\d{4}) instead of relying on positional indices.

Finished reading? Mark it complete to track your progress.

On this page