What is a regular expression?
Understand what regex is, when to reach for it, and the three building blocks every pattern is made of.
- Explain in plain terms what a regular expression is
- Decide when regex is the right tool versus plain string methods
- Name the three building blocks every pattern is composed of
A regular expression (regex for short) is a pattern that describes a set of strings. You write the pattern; the engine tests whether a given piece of text matches it, and optionally hands you back the matched portions.
In JavaScript that looks like this:
const result = "Order #4821 received".match(/\d+/);
// result[0] === "4821"The pattern /\d+/ means "one or more digit characters." The engine scans the
string and returns the first run of digits it finds — "4821". No loops, no
index arithmetic, no branching.
When to use regex — and when not to
Regex is the right tool when the structure of the text is regular (meaning: it follows a predictable pattern) and you need to find, validate, or extract pieces of it. Common good fits:
- Validation: does this string look like a phone number / ZIP code / email?
- Extraction: pull every date out of a log file.
- Transformation: replace all occurrences of one pattern with something else.
- Parsing simple formats: split a CSV line on commas that are not inside quotes.
Regex is the wrong tool when the text has real nesting or context that a finite pattern cannot capture. HTML and JSON are the textbook examples: a tag can contain other tags, and a regex cannot track that depth. For those, use a dedicated parser.
A simple rule: if a plain string method (includes, startsWith, split,
indexOf) reads clearly and does the job, use it. Reach for regex when the
pattern is variable or structural — when you need "a digit" not "the digit 5".
The three building blocks
Every regex pattern, no matter how long, is assembled from three kinds of things:
1. Literals — characters that match themselves exactly.
/cat/ matches the characters c, a, t in that order. Literal matching is
case-sensitive by default, so /cat/ does not match "Cat".
2. Metacharacters — characters with special meaning.
The twelve metacharacters are: . ^ $ * + ? { } [ ] \ | ( )
They don't match themselves — they control the engine. For example, . means
"any single character except a newline," and * means "zero or more of the
preceding thing." You'll learn each of them in the coming lessons.
3. Quantifiers — expressions that say how many times something should repeat.
\d+ means \d (one digit) repeated one or more times (+). Quantifiers let a
single small pattern cover variable-length input.
Regex lives everywhere
The same pattern notation, with tiny dialect variations, works in:
- JavaScript —
/pattern/flagsliteral syntax, ornew RegExp("pattern") - Python —
remodule:re.search(r"pattern", text) - Go, Rust, Java, C#, PHP, Ruby — each has a regex library; the core syntax is nearly identical
- SQL — many engines support
REGEXPorSIMILAR TO - Command line —
grep -E,sed,awkall speak regex - Text editors — VS Code, Vim, Emacs all have regex find-and-replace
Learning regex once pays dividends across every environment you work in.
A first live example
Try editing the pattern below. Change /\d+/ to /[a-z]+/ (lowercase letters)
and see what changes:
Notice the g flag on the second pattern. Without it, .match() returns only
the first result. With g ("global"), it returns every match. Flags are covered
in depth in the Practical matching module.
Where to go next
Next up: literals and metacharacters — the twelve special characters and exactly what each one does.
Regular Expressions
Learn to match, extract, and transform text with the pattern language built into every major programming environment.
Literals and metacharacters
Literal characters match themselves; the twelve metacharacters control the engine. Learn which is which and how to escape when you need the literal character.