BytePane

Regex Cheat Sheet: Complete Reference Guide (2026)

Text Processing16 min read

Key Takeaways

  • Regex syntax is nearly identical across JavaScript, Python, PHP, Go, and Java — the differences are mostly in engine flags and API.
  • According to OWASP research, approximately 20% of regex patterns in production applications are vulnerable to ReDoS — avoid nested quantifiers like (a+)+.
  • An empirical study of 356 regex bugs found that 46.3% stem from incorrect semantics — test your patterns against edge cases, not just happy paths.
  • Named capturing groups (?<name>...) are supported in all modern engines and make patterns self-documenting.
  • Compile patterns once outside of loops — regex compilation is expensive; reuse the compiled object for 10–100x better performance on repeated matching.

What Are Regular Expressions?

Regular expressions (regex or regexp) are sequences of characters that define search patterns. They are used in virtually every programming language for string searching, matching, validation, and text extraction. A 2019 empirical study published at ESEC/FSE found that 94% of developers re-use regex patterns, with 50% reusing them at least half the time — yet they remain one of the most frequently misunderstood tools in the craft.

This cheat sheet is a complete reference you can bookmark and return to. It covers metacharacters, quantifiers, groups, lookarounds, flags, and production-ready validation patterns. To test any pattern interactively, use our Regex Tester tool.

Basic Metacharacters

Metacharacters are the building blocks of regular expressions. Each has a special meaning beyond its literal character value.

PatternDescriptionExampleMatches
.Any character except newlineh.that, hot, hit
^Start of string/line^Hello"Hello world"
$End of string/lineworld$"Hello world"
*Zero or more of previousab*cac, abc, abbc
+One or more of previousab+cabc, abbc (not ac)
?Zero or one of previouscolou?rcolor, colour
\Escape special character\.Literal dot
|Alternation (OR)cat|dogcat or dog

Character Classes

Character classes match any one character from a specific set. They are defined using square brackets and support ranges, negation, and shorthand notations.

PatternDescriptionEquivalent
[abc]Any of a, b, or c
[^abc]Any character NOT a, b, or c
[a-z]Any lowercase letter
[A-Z]Any uppercase letter
[0-9]Any digit\d
\dAny digit[0-9]
\DAny non-digit[^0-9]
\wWord character[a-zA-Z0-9_]
\WNon-word character[^a-zA-Z0-9_]
\sWhitespace[ \t\n\r\f\v]
\SNon-whitespace[^ \t\n\r\f\v]

Quantifiers: Greedy vs. Lazy

Quantifiers specify how many times a preceding element must occur. The most critical distinction — one that bites developers constantly — is greedy vs. lazy matching.

QuantifierModeDescriptionExample
*Greedy0 or more, as many as possiblea*
+Greedy1 or more, as many as possiblea+
?Greedy0 or 1 (optional)a?
{n}ExactExactly n timesa{3}
{n,}Greedyn or more timesa{2,}
{n,m}GreedyBetween n and m timesa{2,4}
*?Lazy0 or more, as few as possiblea*?
+?Lazy1 or more, as few as possiblea+?
{n,m}?LazyBetween n and m, as few as possiblea{2,4}?
// Input: "<a>text</a><b>more</b>"
// Greedy: matches the ENTIRE string from first < to last >
/<.*>/g   // → ["<a>text</a><b>more</b>"]

// Lazy: matches each tag individually
/<.*?>/g  // → ["<a>", "</a>", "<b>", "</b>"]

The greedy-vs-lazy distinction matters most when parsing delimited content like HTML tags, quoted strings, or anything with a repeated open/close pattern.

Groups and Backreferences

Groups let you treat multiple characters as a single unit, apply quantifiers to complex patterns, and capture matched text for extraction or replacement. Named groups — supported in all modern engines — make patterns substantially more readable.

PatternDescriptionExample
(abc)Capturing group(ha)+ matches "haha"
(?:abc)Non-capturing group(?:ha)+ groups without capture
(?<name>abc)Named capturing group(?<year>\d{4})
\1Backreference to group 1(a)\1 matches "aa"
\k<name>Named backreference\k<year> re-matches captured year
// Named groups make complex patterns self-documenting
const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-03-14".match(dateRegex);
console.log(match.groups.year);  // "2026"
console.log(match.groups.month); // "03"
console.log(match.groups.day);   // "14"

// Backreference: detect duplicate consecutive words
const dupeWords = /\b(\w+)\s+\1\b/gi;
"the the quick brown fox".match(dupeWords); // ["the the"]

// Non-capturing group for grouping without storing
/(?:https?|ftp):\/\//.test("https://example.com"); // true

Anchors and Boundaries

AnchorDescriptionExample
^Start of string (or line with m flag)^Error
$End of string (or line with m flag).json$
\bWord boundary\bcat\b
\BNon-word boundary\Bcat\B
\AAbsolute start of string (Python/PHP)\Astart
\ZAbsolute end of string (Python/PHP)end\Z
// Word boundary: match "cat" but not "concatenate"
/\bcat\b/.test("the cat sat");     // true
/\bcat\b/.test("concatenate");     // false

// Multiline anchors: ^ and $ match each line
const multiline = /^Error: .+$/gm;
const log = "OK: all good\nError: disk full\nOK: recovered";
log.match(multiline); // ["Error: disk full"]

Lookaheads and Lookbehinds

Lookarounds are zero-width assertions — they check context without consuming characters. They are essential for matching patterns that depend on surrounding context without including that context in the result.

PatternTypeDescription
(?=abc)Positive lookaheadMatch if followed by abc
(?!abc)Negative lookaheadMatch if NOT followed by abc
(?<=abc)Positive lookbehindMatch if preceded by abc
(?<!abc)Negative lookbehindMatch if NOT preceded by abc
// Extract number before "px" (lookahead)
/\d+(?=px)/.exec("font-size: 16px"); // ["16"]

// Password strength: at least 1 uppercase, 1 digit, 8+ chars
/^(?=.*[A-Z])(?=.*\d).{8,}$/.test("Secret1!");  // true
/^(?=.*[A-Z])(?=.*\d).{8,}$/.test("password1"); // false

// Extract price after "$" (lookbehind)
/(?<=\$)[\d.]+/.exec("Total: $99.99"); // ["99.99"]

// Match "foo" NOT followed by "bar"
/foo(?!bar)/.test("foobar");  // false
/foo(?!bar)/.test("foobaz");  // true

Regex Flags Quick Reference

Flag (JS)NameEffectPython Equiv
gGlobalFind all matches, not just the firstre.findall()
iCase-insensitiveIgnore letter casere.IGNORECASE
mMultiline^ and $ match line boundariesre.MULTILINE
sDotall. matches newline characters toore.DOTALL
uUnicodeEnable full Unicode matchingAlways on
yStickyMatch only at lastIndex position

Gotcha with the g flag: When you use .test() or .exec() with the g flag, the regex object is stateful — it remembers its lastIndex position. Calling .test() on the same compiled regex multiple times will produce alternating true/false results unless you reset lastIndex = 0 between calls.

Production-Ready Validation Patterns

These are battle-tested patterns for common validation tasks. The empirical study on regex bugs (Mining Software Repositories 2020, 356 bugs across 195 GitHub repositories) found that 46.3% of regex bugs are caused by incorrect semantics — so test each pattern against edge cases, including empty strings, unicode, and malformed inputs.

Email Validation

// RFC 5322-compliant (simplified — true RFC 5322 is ~6KB of regex)
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;

// Test cases:
emailRegex.test("[email protected]");      // ✓
emailRegex.test("[email protected]"); // ✓
emailRegex.test("[email protected]");             // ✗
emailRegex.test("@example.com");          // ✗

// Note: Always send a verification email for definitive validation.
// No regex catches all valid/invalid emails per RFC 5321.

URL Validation

// Strict HTTPS URL
const httpsUrl = /^https:\/\/([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}(\/[^\s]*)?$/;

// Permissive URL (http or https, optional path)
const anyUrl = /^(https?:\/\/)?([\w-]+\.)+[\w-]+(\/[\w\-./?%&=]*)?$/;

httpsUrl.test("https://bytepane.com/regex-tester/"); // ✓
httpsUrl.test("http://example.com");                  // ✗ (requires https)

US Phone Number

const usPhone = /^(\+1[\s-]?)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$/;

// All match:
usPhone.test("+1-555-123-4567"); // ✓
usPhone.test("(555) 123-4567");  // ✓
usPhone.test("5551234567");      // ✓
usPhone.test("555.123.4567");    // ✓

Strong Password

// Min 8 chars, 1 uppercase, 1 lowercase, 1 digit, 1 special char
const strongPwd = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/;

strongPwd.test("Secret1!");  // ✓
strongPwd.test("password1"); // ✗ (no uppercase, no special)
strongPwd.test("P@ss1");     // ✗ (too short)

IPv4 Address

// Strict 0-255 range validation
const ipv4 = /^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$/;

ipv4.test("192.168.1.1");     // ✓
ipv4.test("255.255.255.255"); // ✓
ipv4.test("256.1.1.1");       // ✗ (256 > 255)
ipv4.test("192.168.1");       // ✗ (only 3 octets)

ISO Date (YYYY-MM-DD)

// Validates format and plausible ranges (not calendar validity)
const isoDate = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/;

isoDate.test("2026-03-14"); // ✓
isoDate.test("2026-13-01"); // ✗ (month 13 invalid)
isoDate.test("2026-00-15"); // ✗ (month 00 invalid)
// Note: does not catch Feb 31 — use Date.parse() for calendar validation

Hex Color Code

// 3, 4, 6, or 8 digit hex (with optional alpha channel)
const hexColor = /^#([0-9A-Fa-f]{3,4}|[0-9A-Fa-f]{6}|[0-9A-Fa-f]{8})$/;

hexColor.test("#fff");       // ✓ (3-digit)
hexColor.test("#FF5733");    // ✓ (6-digit)
hexColor.test("#FF573380");  // ✓ (8-digit with alpha)
hexColor.test("#GGGGGG");    // ✗ (G is not hex)

Convert hex colors to RGB or HSL directly in our Color Converter tool.

Regex by Language: API Quick Reference

JavaScript

// Literal and constructor syntax
const re1 = /pattern/gi;
const re2 = new RegExp("pattern", "gi");  // for dynamic patterns

// Key methods
"hello world".match(/\w+/g);             // ["hello", "world"]
"hello".replace(/l/g, "r");              // "herro"
"hello".replaceAll("l", "r");            // ES2021 string method
/^test/.test("test string");              // true
"a1b2c3".matchAll(/[a-z](\d)/g);        // ES2020 iterator

// Destructuring named groups (ES2018+)
const { groups: { year, month } } = "2026-03".match(
  /(?<year>\d{4})-(?<month>\d{2})/
);

Python

import re

# Compile once for repeated use (significant performance gain)
pattern = re.compile(r"[A-Z][a-z]+", re.MULTILINE)

re.search(r"\d+", "abc 123")            # Match object
re.findall(r"\d+", "a1 b2 c3")          # ["1", "2", "3"]
re.findall(r"(\w+)=(\w+)", "k=v")      # [("k", "v")]
re.sub(r"\d", "X", "a1b2c3")           # "aXbXcX"
re.split(r"[,;\s]+", "a, b; c d")      # ["a", "b", "c", "d"]

# Named groups
m = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})", "2026-03")
m.group("year")   # "2026"
m.group("month")  # "03"

Go

import "regexp"

// Go uses RE2 syntax (no lookaheads, no backreferences)
// All Go regex operations are guaranteed O(n) — immune to ReDoS
re := regexp.MustCompile("\\d{4}-\\d{2}-\\d{2}")

re.MatchString("2026-03-14")        // true
re.FindString("Date: 2026-03-14")   // "2026-03-14"
re.FindAllString("a1 b2 c3", -1)    // ["1", "2", "3"]
re.ReplaceAllString("a1b2", "\d", "X") // "aXbX"

// Named groups
re2 := regexp.MustCompile("(?P<year>\\d{4})-(?P<month>\\d{2})")
match := re2.FindStringSubmatch("2026-03")
year := match[re2.SubexpIndex("year")] // "2026"

Go's regexp package uses the RE2 engine, which guarantees linear-time execution on all inputs — making it immune to ReDoS by design. The trade-off: no lookaheads or backreferences. For log parsing and data pipelines, Go's regex is often the right choice precisely because of this safety guarantee.

Performance & Security: Avoiding ReDoS

ReDoS (Regular Expression Denial of Service) is a real threat. OWASP research shows that approximately 20% of regex patterns in production applications are vulnerable — and a single malicious input can pin a thread at 100% CPU for seconds or minutes. The Rust Leipzig benchmark project (which compares PCRE, RE2, Rust regex, Hyperscan, and others) found that Hyperscan is the fastest engine, with the Rust regex crate and PCRE2-JIT tied for second — both using linear-time algorithms that prevent catastrophic backtracking.

PatternRiskFix
(a+)+Exponential backtrackinga+
([a-z]+)*Nested quantifier[a-z]*
(.*)*Catastrophic backtracking.*
(\w|\d)+Ambiguous alternation[\w\d]+
(a|aa)+Overlapping alternativesa+
  1. Avoid nested quantifiers — patterns like (a+)+ cause exponential backtracking on pathological inputs.
  2. Be specific with character classes — use [a-z] instead of . when you know the expected character set.
  3. Anchor your patterns^ and $ prevent unnecessary scanning of the full string.
  4. Compile once, reuse often — compile regex outside of loops. In Python, use re.compile(); in Go, regexp.MustCompile() at package level.
  5. Set execution timeouts — in server-side code that accepts user-supplied regex or text, wrap regex execution with a timeout.
  6. Use RE2-based engines for user input — Go's regexp and Google's RE2 library guarantee linear-time execution with no backtracking.

For regex validation and debugging on the fly, use our Regex Tester. For string analysis after processing, the Word Counter gives you character and word statistics instantly.

Test Your Regex Patterns

Stop guessing whether your regex works. Paste your pattern and test string into our Regex Tester for instant matching with highlighted results, group extraction, and full flag support — no installation required.

Open Regex Tester

Frequently Asked Questions

What is the difference between greedy and lazy quantifiers?

Greedy quantifiers (*, +, ​{n,m}) match as many characters as possible, then backtrack. Lazy quantifiers (*?, +?, ​{n,m}?) match as few characters as possible. Given "<a><b>", greedy <.*> matches the whole string, while lazy <.*?> matches just <a>.

What is ReDoS and how do I prevent it?

ReDoS (Regular Expression Denial of Service) occurs when a maliciously crafted input triggers exponential backtracking in a regex engine. Per OWASP research, roughly 20% of production regex patterns are vulnerable. Prevent it by avoiding nested quantifiers like (a+)+, anchoring patterns, and using RE2-based engines (Go's regexp package) for user-supplied input.

What does the 'g' flag do and when does it cause bugs?

The g (global) flag finds all matches instead of stopping after the first. The bug: in JavaScript, a regex object with the g flag is stateful — it tracks lastIndex. Calling .test() repeatedly on the same regex object alternates between true and false. Reset lastIndex = 0 between calls or use new RegExp() to create a fresh instance.

When should I use a capturing group vs a non-capturing group?

Use a capturing group (abc) when you need to extract or reference the matched text — for backreferences or programmatic access. Use a non-capturing group (?:abc) when you only need grouping for structure (alternation or quantifiers) but not the value. Non-capturing groups are slightly faster since the engine skips recording the match position.

Can I use regex to parse HTML?

No — and this is one of the most common mistakes in web development. HTML is not a regular language; it has nested, recursive structure that regex cannot reliably handle. Use a proper parser: DOMParser in the browser, BeautifulSoup in Python, or golang.org/x/net/html in Go. Regex is appropriate for extracting simple text patterns, not for parsing document trees.

How do lookaheads differ from lookbehinds?

A lookahead (?=...) checks what comes after the current match position without consuming characters. A lookbehind (?<=...) checks what came before. Both have positive and negative variants. Example: \d+(?=px) matches a number only if followed by "px" — the "px" itself is not part of the match result.

Which regex engine is fastest?

The Rust Leipzig benchmark project (comparing PCRE, RE2, Hyperscan, Rust regex, and others on real-world data) found Hyperscan fastest overall, with the Rust regex crate and PCRE2-JIT tied for second. For safety (ReDoS immunity) and good performance, the Rust regex crate and Go's regexp package use linear-time NFA algorithms that cannot backtrack catastrophically.

Related Articles