Capturing groups
Use parentheses to capture matched text for extraction and reuse — the fundamental mechanism behind regex-based data extraction.
- Explain the difference between matching and capturing
- Access captured group values from a match result
- Work with multiple groups and understand their numbering
- Distinguish the full match from captured sub-matches
In the beginner tier you used parentheses to group patterns so that quantifiers
could apply to a sequence: /(ab)+/. But grouping has a second effect: the text
matched by each (…) pair is captured — saved so you can retrieve it
separately from the full match.
Capturing transforms regex from a yes/no tool into an extraction tool.
The match result object
In JavaScript, String.prototype.match() (without the g flag) returns an
array where:
- Index
0— the full match - Index
1,2, … — the captured groups in left-to-right order
const result = "2024-03-15".match(/(\d{4})-(\d{2})-(\d{2})/);
console.log(result[0]); // "2024-03-15" — full match
console.log(result[1]); // "2024" — group 1
console.log(result[2]); // "03" — group 2
console.log(result[3]); // "15" — group 3You can destructure this directly:
const [, year, month, day] = "2024-03-15".match(/(\d{4})-(\d{2})-(\d{2})/);
console.log(year, month, day); // "2024" "03" "15"The leading comma skips index 0 (the full match).
Groups are numbered left to right by opening paren
When groups are nested or sequential, they are numbered by the position of their opening parenthesis:
const m = "abc".match(/(a(b))(c)/);
// Group 1: (a(b)) → "ab"
// Group 2: (b) → "b"
// Group 3: (c) → "c"
console.log(m[1], m[2], m[3]); // "ab" "b" "c"Nesting groups like this is uncommon in practice, but understanding the numbering rule prevents confusion when reading other people's patterns.
Multiple groups — a practical example
Extract fields from a log line:
const line = "[2024-03-15 08:42] ERROR: disk quota exceeded";
const m = line.match(/\[(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2})\] (\w+): (.+)/);
if (m) {
const [, date, time, level, message] = m;
console.log({ date, time, level, message });
// { date: "2024-03-15", time: "08:42", level: "ERROR",
// message: "disk quota exceeded" }
}A single pattern can shred a structured line into its components in one pass.
The g flag removes group data
With the global g flag, .match() returns a flat array of full matches — group
data is discarded:
"2024-03-15 and 2024-03-16".match(/(\d{4})-(\d{2})-(\d{2})/g);
// ["2024-03-15", "2024-03-16"] — full matches only, no groupsTo get both all matches and group data, use matchAll:
const re = /(\d{4})-(\d{2})-(\d{2})/g;
for (const m of "2024-03-15 and 2024-03-16".matchAll(re)) {
console.log(m[1], m[2], m[3]); // year, month, day for each
}String.prototype.matchAll() was introduced in ES2020. It returns an iterator
of full match objects (each with group data at indices 1, 2, …), which is
almost always what you want when processing multiple matches with groups.
Groups in Python
Python's re.search() and re.match() return a match object with a .group()
method:
import re
m = re.search(r"(\d{4})-(\d{2})-(\d{2})", "2024-03-15")
m.group(0) # "2024-03-15" — full match (same as m.group())
m.group(1) # "2024"
m.group(2) # "03"
m.group(3) # "15"For all matches with groups, use re.finditer:
for m in re.finditer(r"(\d{4})-(\d{2})-(\d{2})", "2024-03-15 and 2024-03-16"):
print(m.group(1), m.group(2), m.group(3))Optional groups and undefined
If a group is inside an optional part and does not participate in the match, its
slot in the result is undefined (JavaScript) or None (Python):
const m1 = "12345".match(/(\d{5})(-(\d{4}))?/);
console.log(m1[1]); // "12345"
console.log(m1[2]); // undefined — optional group didn't match
console.log(m1[3]); // undefined — inner group didn't match either
const m2 = "12345-6789".match(/(\d{5})(-(\d{4}))?/);
console.log(m2[1]); // "12345"
console.log(m2[2]); // "-6789"
console.log(m2[3]); // "6789"Always check for undefined/null before using an optional group's value.
Where to go next
Next: named groups — giving groups readable names like (?<year>\d{4})
instead of relying on positional indices.