Regular Expressions Explained: From Basics to Advanced

Development 15 min read Last updated: June 19, 2026

What Are Regular Expressions?

Regular expressions (regex or regexp) are sequences of characters that define search patterns. They're one of the most powerful tools in a developer's toolkit for text processing, validation, and data extraction. While they can appear cryptic at first, understanding regex fundamentals opens up efficient solutions to complex text manipulation problems.

Regular expressions originated in theoretical computer science and formal language theory in the 1950s. Today, they're implemented in virtually every programming language and text editor, making them an essential skill for developers.

Basic Syntax

Literal Characters

Most characters match themselves literally. The pattern cat matches the string "cat".

Metacharacters

Special characters with specific meanings in regex:

Character	Meaning	Example
`.`	Any single character (except newline)	`c.t` matches "cat", "cot", "cut"
`^`	Start of string/line	`^Hello` matches "Hello" at start
`$`	End of string/line	`world$` matches "world" at end
`*`	Zero or more of preceding	`ab*c` matches "ac", "abc", "abbc"
`+`	One or more of preceding	`ab+c` matches "abc", "abbc" (not "ac")
`?`	Zero or one of preceding	`colou?r` matches "color", "colour"
`\|`	Alternation (OR)	`cat\|dog` matches "cat" or "dog"

Character Classes

Character classes match any single character from a set:

[abc]     - matches 'a', 'b', or 'c'
[a-z]     - matches any lowercase letter
[A-Z]     - matches any uppercase letter
[0-9]     - matches any digit
[a-zA-Z]  - matches any letter
[^abc]    - matches anything EXCEPT 'a', 'b', or 'c'

Shorthand Character Classes

\d  - digit [0-9]
\D  - non-digit [^0-9]
\w  - word character [a-zA-Z0-9_]
\W  - non-word character
\s  - whitespace (space, tab, newline)
\S  - non-whitespace

Quantifiers

Quantifiers specify how many times a pattern should match:

{n}    - exactly n times
{n,}   - n or more times
{n,m}  - between n and m times
*      - 0 or more (same as {0,})
+      - 1 or more (same as {1,})
?      - 0 or 1 (same as {0,1})

Greedy vs Lazy Matching

By default, quantifiers are greedy—they match as much as possible. Adding ? makes them lazy (matching as little as possible):

// Given: <div>content</div>

<.*>    - greedy: matches "<div>content</div>"
<.*?>   - lazy: matches "<div>" only

Groups and Capturing

Parentheses create groups for applying quantifiers and capturing matches:

// Grouping for quantifiers
(ab)+          - matches "ab", "abab", "ababab"

// Capturing groups for extraction
(\d{3})-(\d{4})  - captures area code and number separately

// Non-capturing groups (grouping without capturing)
(?:ab)+        - groups but doesn't capture

// Named groups (in supported languages)
(?<year>\d{4})-(?<month>\d{2})

Backreferences

Reference captured groups later in the pattern:

// Find repeated words
\b(\w+)\s+\1\b   - matches "the the", "is is"

// \1 refers to whatever was captured by the first group

Lookahead and Lookbehind

These assertions match a position without consuming characters:

// Lookahead
foo(?=bar)     - matches "foo" only if followed by "bar"
foo(?!bar)     - matches "foo" only if NOT followed by "bar"

// Lookbehind
(?<=foo)bar    - matches "bar" only if preceded by "foo"
(?<!foo)bar    - matches "bar" only if NOT preceded by "foo"

Practical Example: Password Validation

// Password must contain at least one digit, one lowercase,
// one uppercase, and be 8+ characters
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,}$

Common Regex Flags

i - Case Insensitive

/hello/i matches "Hello", "HELLO", "hello"

g - Global

Find all matches, not just the first one

m - Multiline

^ and $ match line starts/ends, not just string

s - Dotall

. matches newline characters too

Common Patterns

Email Validation (Basic)

^[\w.-]+@[\w.-]+\.\w{2,}$

URL Matching

https?:\/\/[\w.-]+(?:\/[\w.-]*)*\/?

Phone Number (US)

^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$

Date (YYYY-MM-DD)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

IPv4 Address

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

Performance Considerations

Poorly written regex can cause severe performance issues:

Catastrophic backtracking: Nested quantifiers like (a+)+ can cause exponential time complexity.
Be specific: Use [0-9] instead of .* when you know the format.
Anchor patterns: Use ^ and $ when matching whole strings.
Compile once: In performance-critical code, compile regex patterns once and reuse them.
Test with edge cases: Test patterns against long strings and worst-case inputs.

Regex in Different Languages

JavaScript

const pattern = /\d+/g;
const matches = "abc123def456".match(pattern); // ["123", "456"]
const result = "hello".replace(/l/g, "L"); // "heLLo"

Python

import re
pattern = re.compile(r'\d+')
matches = pattern.findall('abc123def456')  # ['123', '456']
result = re.sub(r'l', 'L', 'hello')  # 'heLLo'

PHP

preg_match_all('/\d+/', 'abc123def456', $matches);
// $matches[0] = ['123', '456']
$result = preg_replace('/l/', 'L', 'hello'); // 'heLLo'

Tips for Learning Regex

Start simple: Master basic patterns before tackling complex ones.
Use a tester: Visual regex testers help you understand how patterns match.
Read patterns aloud: "Match one or more digits followed by a space" helps comprehension.
Build incrementally: Test each part of a complex pattern separately.
Comment complex patterns: Use verbose mode or comments to explain complex regex.
Know when not to use regex: Sometimes simple string methods are clearer and faster.

Test Your Regex Patterns

Use our regex tester to experiment with patterns and see matches in real-time.

Open Regex Tester →

Regular Expressions Explained