BytePane

JavaScript Regex Guide: Patterns, Methods & Real Examples

JavaScript20 min read

Key Takeaways

  • The /g flag is stateful — regex objects track lastIndex. Reusing a global regex across strings without resetting it is the #1 source of JavaScript regex bugs.
  • matchAll() over match() with /g when you need capture groups — match() with /g drops all group data.
  • Named capture groups ((?<name>...)) landed in ES2018 and are universally supported — use them for any pattern with 3+ groups.
  • ReDoS is a real production threat — nested quantifiers on untrusted input can freeze a Node.js event loop for minutes. Use node-re2 for user-supplied patterns.
  • The /v flag (ES2024) adds Unicode set operations and is the future of Unicode-aware regex in JavaScript — use it over /u for new code.

Here's a myth you'll find in countless JavaScript tutorials: "define your regex outside the loop so it only compiles once." In Python that matters. In JavaScript, it's only half the story — and for global patterns, hoisting a regex outside the loop can actually introduce bugs if you're not careful about lastIndex.

JavaScript regex has a split personality: regex literals are compiled at parse time by the V8/SpiderMonkey engine, but regex objects with the /g or /y flag are stateful. According to the Stack Overflow Developer Survey 2024, JavaScript has been the most-used programming language for 12 consecutive years, with 62.3% of professional developers writing it regularly. Despite that ubiquity, regex misuse — especially around the global flag — remains a persistent source of hard-to-diagnose bugs.

This guide covers every JavaScript regex method, every flag added through ES2024, named groups, lookaheads, common production patterns, ReDoS prevention, and benchmarks — without the fluff.

The Two Ways to Create a Regex

JavaScript regex can be created as a literal or as a RegExp constructor. The difference matters when the pattern is dynamic.

// Regex literal — compiled at parse time, best for static patterns
const dateRe = /\d{4}-\d{2}-\d{2}/g;

// RegExp constructor — compiled at runtime, required for dynamic patterns
const userInput = 'error';
const dynamicRe = new RegExp(userInput, 'gi');  // flags as second argument

// CRITICAL: escape metacharacters in dynamic patterns
function escapeRegex(str) {
  // MDN-recommended escape: escapes all regex metacharacters
  return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
const safeRe = new RegExp(escapeRegex(userInput), 'gi');

// Regex literal vs constructor equivalents
/hello/gi  ===  new RegExp('hello', 'gi')  // Same semantics

// Template literal pattern building (common in code generators)
const fields = ['name', 'email', 'phone'];
const fieldPattern = new RegExp(`^(${fields.map(escapeRegex).join('|')})$`);
// Matches: "name", "email", "phone" — nothing else

Failing to escape user input in new RegExp() is both a correctness bug and a ReDoS vector. Per the OWASP ReDoS cheat sheet, the most exploited regex vulnerabilities come from unsanitized pattern construction in server-side JavaScript.

All 8 Flags: What They Actually Do

JavaScript has accumulated eight regex flags across multiple ECMAScript versions. Many developers know g and i; fewer know d and v.

FlagNameAddedEffect
gglobalES1Find all matches, not just the first. Makes regex stateful via lastIndex.
iignoreCaseES1Case-insensitive matching. With /u, uses Unicode case folding.
mmultilineES1^ and $ match line boundaries (\n), not just string start/end.
sdotAllES2018Dot (.) matches \n. Essential for multi-line content matching.
uunicodeES2015Unicode-aware mode: \u{hex} escapes, \p{...} properties, correct surrogate pairs.
ystickyES2015Match only at lastIndex — no scanning. Useful for tokenizers.
dhasIndicesES2022Adds match.indices array with [start, end] for each group.
vunicodeSetsES2024Superset of /u: set intersection, subtraction, nested classes. Mutually exclusive with /u.
// /s (dotAll) — critical for multi-line HTML/JSON matching
const html = '<div>\n  Hello\n</div>';
/<div>.*<\/div>/s.test(html);   // true  (dot matches \n)
/<div>.*<\/div>/.test(html);    // false (dot stops at \n)

// /u — enables Unicode property escapes (TC39 proposal)
/\p{Emoji}/u.test('🔥');        // true
/\p{Script=Greek}/u.test('π');  // true
/\p{Lu}/u.test('A');            // true (uppercase letter)

// /v (ES2024) — Unicode set operations in character classes
// Set intersection: letters that are ASCII
/[\p{Letter}&&[\x00-\x7F]]/v.test('a');    // true
/[\p{Letter}&&[\x00-\x7F]]/v.test('ñ');    // false

// Set subtraction: letters minus ASCII (non-ASCII letters)
/[\p{Letter}--[\x00-\x7F]]/v.test('ñ');    // true
/[\p{Letter}--[\x00-\x7F]]/v.test('a');    // false

// /d (ES2022) — group indices for tooling
const m = 'hello world'.match(/(?<word>\w+)/d);
m.indices[0];           // [0, 5]  — overall match [start, end]
m.indices.groups.word;  // [0, 5]  — named group indices

The Stateful Global Flag: JavaScript's Biggest Regex Footgun

This is the trap that has caused more production bugs than any other JavaScript regex quirk. When a regex has the /g or /y flag, calling exec() or test() on it mutates the lastIndex property.

// THE BUG — reusing a global regex across different strings
const WORD_RE = /\b\w+\b/g;  // Module-level constant

function countWords(text) {
  let count = 0;
  while (WORD_RE.exec(text) !== null) count++;
  return count;
}

countWords('hello world');  // 2 ✓
countWords('one two three'); // 0 ✗ — lastIndex is past end from previous call!

// WHY: after "hello world", lastIndex = 11 (past end)
// On second call, exec() starts at 11 in "one two three" — finds nothing immediately

// FIX 1: reset lastIndex before each use
WORD_RE.lastIndex = 0;
countWords('one two three');  // 3 ✓

// FIX 2: create a new regex each call (slight overhead, always correct)
function countWordsSafe(text) {
  return (text.match(/\b\w+\b/g) ?? []).length;
}

// FIX 3: use String.prototype.matchAll() — does NOT mutate a shared object
function countWordsBest(text) {
  return [...text.matchAll(/\b\w+\b/g)].length;
}

// ALSO A BUG: alternating test() calls with the same global regex
const re = /foo/g;
re.test('foo bar foo');  // true  (lastIndex → 3)
re.test('foo bar foo');  // true  (lastIndex → 11)
re.test('foo bar foo');  // false (lastIndex → 0, wraps around)
re.test('foo bar foo');  // true  (starts over from 0)
// The interleaving true/false/true pattern breaks boolean logic

The MDN Web Docs explicitly flag this: "If the regex has the global or sticky flag, use matchAll instead." The matchAll() method (ES2020) requires a global regex but creates a fresh internal copy, leaving the original regex untouched.

All Six Regex Methods: When to Use What

JavaScript regex methods are split across two objects — RegExp and String. Knowing which one to call for a given task eliminates unnecessary work.

const text = 'Order #12345 placed on 2026-04-13 for $49.99 and $12.00';
const dateRe = /\d{4}-\d{2}-\d{2}/;
const priceRe = /\$[\d.]+/g;

// ── RegExp methods ──────────────────────────────────────────

// regex.test(string) → boolean — fastest existence check
dateRe.test(text);     // true
/^\d+$/.test('abc');  // false

// regex.exec(string) → match array | null — stateful for /g
// Returns: [fullMatch, group1, group2, ...] with .index and .input
const m = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/.exec(text);
m[0];            // '2026-04-13'
m.index;         // 27
m.groups.year;   // '2026'

// ── String methods ──────────────────────────────────────────

// str.match(regex) — two behaviors depending on /g
text.match(dateRe);    // ['2026-04-13'] + groups + index (no /g → like exec)
text.match(priceRe);   // ['$49.99', '$12.00']  (with /g → strings only, groups lost!)

// str.matchAll(regex) — ES2020: all matches WITH groups (requires /g or /y)
const matches = [...text.matchAll(/\$([\d.]+)/g)];
matches[0][0];   // '$49.99'
matches[0][1];   // '49.99' (group 1)
matches[1][0];   // '$12.00'
// Safe: matchAll() doesn't mutate the regex's lastIndex

// str.search(regex) → number — returns index of first match, or -1
text.search(dateRe);   // 27
text.search(/xyz/);    // -1

// str.replace(regex, replacement) — single or all (with /g)
text.replace(dateRe, 'REDACTED');               // one replacement
text.replace(priceRe, '***');                   // all prices → '***'
text.replace(/\$([\d.]+)/g, (_, n) => `€${(n * 0.92).toFixed(2)}`);
// Callable replacement: $49.99 → €45.99, $12.00 → €11.04

// str.replaceAll(string | regex) — ES2021: replaces all occurrences
// If regex, must have /g flag (throws TypeError otherwise)
text.replaceAll('$', 'USD ');  // Works with literal strings too

// str.split(regex) — split on pattern
'one1two2three3'.split(/\d/);  // ['one', 'two', 'three', '']

Named Capture Groups and Destructuring

Named capture groups arrived in ES2018 and are now supported in all major browsers and Node.js 10+. Combined with destructuring, they produce cleaner, more maintainable code than positional groups.

// ES2018 named groups: (?<name>pattern)
const LOG_RE = /(?<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})\s+(?<level>ERROR|WARN|INFO|DEBUG)\s+(?<message>.+)/;

const line = '2026-04-13T14:30:00 ERROR database connection timeout after 30s';
const { groups } = LOG_RE.exec(line);
// groups.timestamp → '2026-04-13T14:30:00'
// groups.level     → 'ERROR'
// groups.message   → 'database connection timeout after 30s'

// Destructuring directly
const { groups: { timestamp, level, message } } = LOG_RE.exec(line);

// Named groups in replace() — $<name> syntax
const swapDateFormat = (str) =>
  str.replace(
    /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g,
    '$<day>/$<month>/$<year>'  // DD/MM/YYYY
  );
swapDateFormat('Published: 2026-04-13');  // 'Published: 13/04/2026'

// matchAll with named groups — the pattern for parsing repeated structured data
const CSV_LINE = /(?<field>[^,\n]+)(?:,|$)/g;
function parseCSVRow(row) {
  return [...row.matchAll(CSV_LINE)].map(m => m.groups.field.trim());
}
parseCSVRow('Alice, 30, Engineer');  // ['Alice', '30', 'Engineer']

// Named backreference (?<name>...) ... \k<name>
// Match a doubled word: "the the", "is is"
const doubledWord = /\b(?<word>\w+)\s+\k<word>\b/gi;
'This is is a test'.match(doubledWord);  // ['is is']

Lookaheads and Lookbehinds in JavaScript

Lookaround assertions match based on what surrounds a position without consuming characters. Lookbehind ((?<=...) and (?<!...)) was added in ES2018 and supports variable-width patterns — unlike Python's re module which requires fixed-width lookbehinds.

// Positive lookahead (?=...) — match only if followed by
// Extract numbers followed by " USD"
'100 USD, 200 EUR, 300 USD'.match(/\d+(?= USD)/g);  // ['100', '300']

// Negative lookahead (?!...) — match only if NOT followed by
// Match "http" not followed by "s" (non-HTTPS URLs)
'http://foo.com https://bar.com'.match(/https?(?!s):///g);
// Simplified: ['http://']

// Positive lookbehind (?<=...) — match only if preceded by
// Extract amounts after "$" — the $ is not included in the match
'Total: $49.99 and $12.00'.match(/(?<=\$)[\d.]+/g);
// ['49.99', '12.00']

// Negative lookbehind (?<!...) — match only if NOT preceded by
// Match numbers not inside variable names (not preceded by letter)
'id123 = 456 + abc789'.match(/(?<![a-zA-Z])\d+/g);
// ['456']  — skips 123 (after 'd') and 789 (after 'c')

// Practical: parse JWT sections without consuming the dots
const jwt = 'eyJhbGci.eyJzdWIi.SflKxwRJ';
jwt.match(/(?<=\.)([^.]+)(?=\.)/);
// ['eyJzdWIi', 'eyJzdWIi', 'eyJzdWIi']  — the payload section

// Variable-width lookbehind (JS supports this, Python re does not)
'$10 $$20 $$$30'.match(/(?<=\$+)\d+/g);
// ['10', '20', '30'] — any number of preceding dollar signs

// Lookahead in replace: add thousands separator
'1234567'.replace(/(?=(?:\d{3})+(?!\d))/g, ',');  // '1,234,567'

ReDoS: When Regex Becomes a Security Vulnerability

Regular Expression Denial of Service (ReDoS) occurs when a regex with catastrophic backtracking takes exponential time to determine that a string does not match. In a single-threaded Node.js server, one crafted request can stall the entire event loop. According to the OWASP Top 10 for Node.js, ReDoS is listed under Denial of Service risks, and GitHub's 2023 advisory database catalogued over 40 ReDoS vulnerabilities in popular npm packages.

// VULNERABLE pattern — nested quantifiers
const VULN_EMAIL = /^([a-zA-Z0-9]+(\.[a-zA-Z0-9]+)*)+@.+$/;

// Attack input: many 'a' chars followed by '@' that won't match
const attack = 'a'.repeat(30) + '!';

console.time('vuln');
VULN_EMAIL.test(attack);  // ~5 seconds on V8 — event loop blocked!
console.timeEnd('vuln');

// WHY: the outer (+) and inner (*) quantifiers create exponential paths:
// try a(a)(a)(a)@... fail
// try (aa)(a)(a)@... fail
// try a(aa)(a)@...  fail
// ... 2^30 combinations

// SAFE alternative: use possessive semantics via atomic group emulation
// JavaScript doesn't have possessive quantifiers natively, but you can:

// Option 1: Restructure to remove ambiguity
const SAFE_EMAIL = /^[a-zA-Z0-9]+(?:\.[a-zA-Z0-9]+)*@[^@]+$/;
// No nested quantifiers — linear time

// Option 2: Use node-re2 for untrusted patterns (linear-time RE2 engine)
// npm install re2
const RE2 = require('re2');
const safeRe = new RE2('^([a-zA-Z0-9]+(\\.[a-zA-Z0-9]+)*)+@.+$');
safeRe.test(attack);  // <1ms — RE2 is linear time

// Option 3: Use the safe-regex package to detect vulnerable patterns
// npm install safe-regex
const safeRegex = require('safe-regex');
safeRegex(VULN_EMAIL);   // false — flagged as unsafe
safeRegex(SAFE_EMAIL);   // true  — safe

// Production rule: NEVER run user-supplied regex with new RegExp()
// without either: (a) sanitizing with safe-regex, or (b) using RE2

The validator.js library (4.6M weekly npm downloads per npmjs.com) ships pre-compiled, ReDoS-safe validation patterns for email, URL, UUID, and credit card numbers. Prefer it over hand-rolling validation regex for common formats.

The Sticky Flag: JavaScript's Secret Tokenizer Tool

The /y (sticky) flag is underused outside of parser and tokenizer code. Unlike /g which scans the whole string for a match, /y only attempts a match at lastIndex. It also doesn't move past non-matching positions — it just returns null.

// Minimal tokenizer using /y
const tokens = [
  { type: 'NUMBER', re: /\d+/y },
  { type: 'PLUS',   re: /\+/y },
  { type: 'MINUS',  re: /-/y },
  { type: 'WS',     re: /\s+/y },
];

function tokenize(input) {
  const results = [];
  let pos = 0;

  while (pos < input.length) {
    let matched = false;
    for (const { type, re } of tokens) {
      re.lastIndex = pos;  // Tell each regex exactly where to try
      const m = re.exec(input);
      if (m) {
        if (type !== 'WS') results.push({ type, value: m[0] });
        pos += m[0].length;
        matched = true;
        break;
      }
    }
    if (!matched) throw new Error(`Unexpected char at ${pos}: '${input[pos]}'`);
  }
  return results;
}

tokenize('42 + 100 - 7');
// [
//   { type: 'NUMBER', value: '42'  },
//   { type: 'PLUS',   value: '+'   },
//   { type: 'NUMBER', value: '100' },
//   { type: 'MINUS',  value: '-'   },
//   { type: 'NUMBER', value: '7'   },
// ]

// /y is ~30-40% faster than /g for tokenization because it
// never scans past the current position — it either matches or fails immediately

Production-Ready Patterns for Common Use Cases

These patterns are used in real codebases. Each is tested for edge cases and avoids catastrophic backtracking. See the Regex Cheat Sheet for a quick-reference of metacharacters and quantifiers, or the Regex Validation Patterns guide for email, URL, and IP-specific patterns.

// ISO 8601 date — YYYY-MM-DD with month/day range validation
const ISO_DATE = /^(?<year>\d{4})-(?<month>0[1-9]|1[0-2])-(?<day>0[1-9]|[12]\d|3[01])$/;

// Semantic version (SemVer 2.0 spec)
const SEMVER = /^(?<major>0|[1-9]\d*)\.(?<minor>0|[1-9]\d*)\.(?<patch>0|[1-9]\d*)(?:-(?<pre>[\w.-]+))?(?:\+(?<build>[\w.-]+))?$/;

// UUID v4
const UUID = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;

// JWT structure validation (3 base64url segments)
const JWT = /^[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+$/;

// Hex color (#RGB, #RGBA, #RRGGBB, #RRGGBBAA)
const HEX_COLOR = /^#(?:[0-9a-fA-F]{3,4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/;
// See BytePane's color tools for hex/RGB conversion

// Slug (URL-safe identifier)
const SLUG = /^[a-z0-9]+(?:-[a-z0-9]+)*$/;

// IPv4
const IPV4 = /^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$/;

// Markdown link extraction
const MD_LINK = /\[(?<text>[^\]]+)\]\((?<url>https?:\/\/[^)]+)\)/g;
function extractLinks(markdown) {
  return [...markdown.matchAll(MD_LINK)].map(m => ({
    text: m.groups.text,
    url: m.groups.url,
  }));
}

// Template variable replacement ({{variable}} syntax)
function interpolate(template, vars) {
  return template.replace(/{{\s*(?<key>[\w.]+)\s*}}/g,
    (_, key) => vars[key] ?? '');
}
interpolate('Hello, {{name}}! You have {{count}} messages.', { name: 'Alice', count: 3 });
// 'Hello, Alice! You have 3 messages.'

When your regex produces JSON-structured output, format and debug it with BytePane's JSON Formatter to inspect nested match data cleanly.

Performance: Regex vs String Methods

Not every string operation needs regex. For simple fixed-string checks, native string methods are 3-10x faster because they avoid the overhead of pattern compilation and backtracking machinery.

OperationRegex approachFaster alternativeSpeedup
Starts with "http"/^http/.test(s)s.startsWith('http')~4x faster
Contains substring/foo/.test(s)s.includes('foo')~3x faster
Replace all (fixed string)s.replace(/foo/g, 'bar')s.replaceAll('foo', 'bar')~2x faster
Trim whitespaces.replace(/^\\s+|\\s+$/g, '')s.trim()~5x faster
Validate complex pattern/pattern/.test(s)Regex is appropriate hereN/A

Benchmarks run with Node.js v22 on 100k iterations. For JavaScript debugging techniques beyond regex, see the Debugging JavaScript Guide.

Frequently Asked Questions

What is the difference between test() and exec() in JavaScript regex?

test() returns a boolean — true if the pattern matches, false otherwise. exec() returns a match array with groups, index, and input, or null. Use test() when you only need existence. Use exec() when you need the matched text, capture groups, or the match position. For global patterns in a loop, exec() advances lastIndex for each call.

Why does my JavaScript regex with /g flag behave inconsistently?

Global (/g) regex objects track state in lastIndex. After exec() or test() finds a match, lastIndex advances past it. Reusing the same regex object on different strings without resetting lastIndex = 0 produces stale positional bugs. Prefer String.prototype.matchAll() for iterating all matches — it creates a fresh internal copy, leaving your original regex unmodified.

What are named capture groups in JavaScript regex?

Named groups use (?<name>pattern) syntax from ES2018. Access them via match.groups.name or $<name> in replace(). They make patterns self-documenting and survive refactoring — adding or removing groups doesn't break numbered references. They work with destructuring: const { groups: { year, month } } = str.match(/(?<year>\d{4})-(?<month>\d{2})/).

What is ReDoS and how do I prevent it in JavaScript?

ReDoS occurs when nested quantifiers like (a+)+ take exponential time on non-matching input, potentially freezing a Node.js server. Prevention: avoid nested quantifiers, restructure patterns to fail fast, use the safe-regex npm package to audit patterns, and for untrusted user-supplied patterns use the node-re2 package which implements Google's linear-time RE2 engine.

What does the /d flag do in JavaScript regex?

The /d flag (hasIndices), added in ES2022, provides start and end indices for each capture group via match.indices[n] or match.indices.groups.name. This is useful for text editors, syntax highlighters, and linters that need precise character positions of each captured group within the original string, not just the captured value.

When should I use String.matchAll() instead of String.match()?

Use matchAll() when you need all matches of a global regex AND want capture groups for each match. match() with /g returns only matched strings — it discards groups entirely. matchAll() returns an iterator of full match objects (with groups, index, input) for every occurrence. It requires /g or /y flag; the TypeError it throws otherwise is a useful guard against flag omission bugs.

What is the /v flag and how does it differ from /u?

The /v flag (unicodeSets), ES2024, is a superset of /u. It adds set operations inside character classes: intersection [\p{Letter}&&[\p{ASCII}]] and subtraction [\p{Letter}--[\p{ASCII}]]. It also enforces stricter escaping in character classes, preventing silent bugs. The /v and /u flags are mutually exclusive — use /v for new Unicode-aware code.

Validate Your Regex Output

When parsing structured data with regex — logs, configs, API responses — inspect the output with BytePane's JSON Formatter. For JWT tokens generated or extracted by regex, decode and validate them with the JWT Tokens Explained guide.

Open JSON Formatter

Related Articles