A Practical Guide to Regular Expressions

Regular expressions look like line noise until the moment they save you an afternoon of fragile string-slicing code. The good news is that the syntax is small and mostly composable, so once you understand the handful of building blocks you can read and write almost any pattern. This guide builds regex up piece by piece using the JavaScript flavor, then assembles a few patterns you'd actually use.

What a regex actually is

A regular expression is a small program that describes a set of strings. You hand it some input text, and it answers questions: does this match? where? what did it capture? In JavaScript you write one with slashes:

js
const re = /cat/;
re.test("the cat sat"); // true
re.test("dog");         // false

The pattern

text

/cat/

matches the literal three characters

text

c

text

a

text

t

appearing in sequence anywhere in the string. That's the simplest possible regex, and it's worth internalizing: by default a regex looks for its pattern somewhere inside the input, not for the whole string to equal it.

You can also build one from a string with the

text

RegExp

constructor, which matters when the pattern is dynamic:

js
const word = "cat";
const re = new RegExp(word); // same as /cat/

With the constructor you have to double-escape backslashes (

text

"\\d"

instead of

text

\d

), so prefer the literal

text

/.../

form whenever the pattern is known at write time.

Literals and metacharacters

Most characters in a pattern match themselves. The exceptions are metacharacters, which have special meaning:

text

. ^ $ * + ? ( ) [ ] { } | \

. When you want a literal version of one of these, escape it with a backslash. A literal dot is

text

\.

, a literal plus is

text

\+

This is the single most common beginner bug. The pattern

text

/3.14/

does not match only

text

3.14

— the

text

.

matches any character, so it also matches

text

3x14

and

text

3 14

. To match a literal period you need

text

/3\.14/

Character classes

A character class, written with square brackets, matches exactly one character from a set:

js
/[aeiou]/   // any single vowel
/[0-9]/     // any single digit (range)
/[a-fA-F]/  // a hex letter, either case

Ranges use a hyphen. You can combine ranges and individual characters in one class:

text

[a-z0-9_]

matches one lowercase letter, digit, or underscore.

A caret as the first character inside the brackets negates the class:

js
/[^0-9]/  // any single character that is NOT a digit

Because some classes are so common, regex provides shorthands:

text
\d
— a digit, same as
text
[0-9]
text
\w
— a "word" character:
text
[A-Za-z0-9_]
text
\s
— whitespace (space, tab, newline, and more)
text
\D
,
text
\W
,
text
\S
— the negated versions of each

And

text

.

is the broadest of all: any character except a newline (unless you turn on the

text

s

flag, covered below).

A subtle point: inside a character class, most metacharacters lose their power.

text

[.+*]

matches a literal dot, plus, or asterisk — no escaping needed. The characters you still need to be careful with inside a class are

text

]

text

\

text

^

(at the start), and

text

-

(between two characters).

Quantifiers

Quantifiers say how many of the preceding element to match:

text
*
— zero or more
text
+
— one or more
text
?
— zero or one (i.e. optional)
text
{n}
— exactly n
text
{n,}
— n or more
text
{n,m}
— between n and m, inclusive

Examples:

js
/colou?r/        // matches "color" and "colour"
/\d{3}-\d{4}/    // 123-4567
/a{2,4}/         // "aa", "aaa", or "aaaa"
/\w+/            // one or more word characters

Quantifiers attach to whatever immediately precedes them: a single character, a character class, or a group.

Greedy vs. lazy

By default quantifiers are greedy — they grab as much as they can while still allowing the overall pattern to match. This trips people up constantly. Consider extracting the contents of an HTML tag:

js
"<b>one</b><b>two</b>".match(/<b>(.*)<\/b>/)[1];
// "one</b><b>two"

The

text

.*

ate everything up to the last

text

</b>

. Add a

text

?

after a quantifier to make it lazy, matching as little as possible:

js
"<b>one</b><b>two</b>".match(/<b>(.*?)<\/b>/)[1];
// "one"

(For real HTML, use a DOM parser — but lazy quantifiers are exactly the right tool for many small text-extraction jobs.)

Anchors and boundaries

Anchors don't match characters; they match positions.

text
^
— start of the string (or start of a line, with the
text
m
flag)
text
$
— end of the string (or end of a line, with
text
m
)
text
\b
— a word boundary, the edge between a
text
\w
and a non-
text
\w
character

Anchors are how you require a pattern to span the whole input rather than just appearing within it:

js
/^\d+$/.test("42");    // true — the whole string is digits
/^\d+$/.test("42px");  // false

Word boundaries let you match whole words.

text

/\bcat\b/

matches

text

cat

in "the cat sat" but not in "category" or "scatter":

js
/\bcat\b/.test("category"); // false
/\bcat\b/.test("the cat");  // true

Groups and capturing

Parentheses do two jobs at once: they group a sub-pattern so a quantifier can apply to the whole thing, and they capture the matched text for later use.

js
/(ab)+/         // one or more repetitions of "ab"

When a regex with capturing groups matches, you get the captured substrings back:

js
const m = "2026-06-02".match(/(\d{4})-(\d{2})-(\d{2})/);
m[1]; // "2026"  (first group)
m[2]; // "06"
m[3]; // "02"

Named groups

Numbered groups get unreadable fast. Name them with

text

(?<name>...)

and read them off the

text

groups

object:

js
const m = "2026-06-02".match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
m.groups.year;  // "2026"
m.groups.month; // "06"

Non-capturing groups

If you only need grouping for a quantifier and don't care about capturing, use

text

(?:...)

. It's slightly faster and keeps your capture numbering clean:

js
/(?:https?:\/\/)?example\.com/  // the protocol is optional, but not captured

Alternation

The pipe

text

|

means "or" and has very low precedence, so it splits the entire pattern unless you contain it in a group. This distinction matters:

js
/^cat|dog$/    // ^cat   OR   dog$   — probably not what you meant
/^(cat|dog)$/  // exactly "cat" or exactly "dog"

Almost always you want alternation wrapped in a group so the surrounding anchors and quantifiers apply to both branches.

Flags

Flags go after the closing slash and change how the whole pattern behaves:

text
g
— global: find all matches, not just the first
text
i
— case-insensitive
text
m
— multiline:
text
^
and
text
$
match at line breaks, not just string ends
text
s
— dotall:
text
.
also matches newlines
text
u
— unicode: correct handling of code points beyond the basic range

The

text

g

flag changes which methods are useful. To collect every match with its groups,

text

matchAll

is the clean modern option:

js
const text = "a1 b2 c3";
for (const m of text.matchAll(/([a-z])(\d)/g)) {
  console.log(m[1], m[2]); // a 1, then b 2, then c 3
}

One gotcha worth knowing: a

text

RegExp

object with the

text

g

(or

text

y

) flag is stateful — it remembers a

text

lastIndex

between calls to

text

.test()

and

text

.exec()

. Reusing the same global regex across calls can give surprising alternating true/false results. If you don't need that statefulness, don't add

text

g

, or create a fresh regex each time.

Putting it together

With the pieces in hand, here are a few patterns built from what we've covered.

A hex color — a

text

#

followed by exactly three or six hex digits:

js
/^#(?:[0-9a-fA-F]{3}|[0-9a-fA-F]{6})$/

A simple time in 24-hour form — hours

text

00

–

text

23

, minutes

text

00

–

text

59

. This is where regex's character-by-character thinking shows: you constrain each digit position rather than parsing a number:

js
/^([01]\d|2[0-3]):[0-5]\d$/

Pulling key-value pairs from a query string fragment:

js
const q = "name=ada&lang=js&year=2026";
const pairs = [...q.matchAll(/(?<key>\w+)=(?<val>\w+)/g)]
  .map(m => [m.groups.key, m.groups.val]);
// [["name","ada"],["lang","js"],["year","2026"]]

A word of honesty about email: the truly correct email regex is enormous and still rejects valid addresses. For real validation, check for a single

text

@

with something on each side and confirm by sending a verification message. A pragmatic sanity check is fine — just don't pretend a regex proves an address exists:

js
/^[^\s@]+@[^\s@]+\.[^\s@]+$/

Habits that keep regex maintainable

Regex rewards a few disciplines. Anchor with

text

^

and

text

$

whenever you mean "the entire string," or you'll match unexpected substrings. Reach for lazy quantifiers when you're extracting between delimiters. Name your groups once a pattern has more than one. And when a pattern grows past a line or two, that's usually a signal to split the work between a simpler regex and ordinary code — readable beats clever.

Finally, never ship a pattern you haven't run against real inputs, including the messy edge cases and the strings that should not match. Building the pattern incrementally and testing as you go beats staring at a wall of metacharacters. You can prototype against your own sample text in the free Cosmovex regex tester, which highlights matches and capture groups live so you can see exactly what each part of the pattern is doing before it reaches production. Start with the literal you know matches, add one construct at a time, and confirm each step does what you expect.

We use cookies

A Practical Guide to Regular Expressions

What a regex actually is

Literals and metacharacters

Character classes

Quantifiers

Greedy vs. lazy

Anchors and boundaries

Groups and capturing

Named groups

Non-capturing groups

Alternation

Flags

Putting it together

Habits that keep regex maintainable

Related guides