Why does my regex match too much? How do I fix greedy matching?

By default, quantifiers like * and + are greedy — they match as many characters as possible. Add ? to make them lazy: *? and +? match as few characters as possible. Example: given " ", the pattern matches the whole string, while matches only " ". Anchors (^ and $) and character classes ([^<]*) are also effective ways to prevent over-matching.

What is ReDoS and which regex patterns cause it?

ReDoS (Regular Expression Denial of Service) occurs when nested quantifiers cause exponential backtracking on adversarial input. According to OWASP, ~20% of production regex patterns contain this vulnerability. Classic dangerous patterns: (a+)+, (a|aa)+, (.*a){x} with large x. Prevention: avoid nested quantifiers, use atomic groups when available, set execution timeouts, and use a tool like safe-regex to audit patterns.

Regex Examples: 30 Common Patterns for Everyday Use

Q: What is the best regex for validating email addresses?

The RFC 5321-compliant email regex is notoriously complex (6,000+ characters). For production use, /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/ catches 99%+ of real-world addresses. For strict validation, send a verification email instead — regex cannot confirm a mailbox exists. Per W3C HTML spec, the email input type uses a similar simplified pattern.

Q: How do I make a regex case-insensitive?

Add the i flag. In JavaScript: /pattern/i or new RegExp("pattern", "i"). In Python: re.compile("pattern", re.IGNORECASE) or re.IGNORECASE as a flag. In Go: use (?i) at the start of the pattern, like (?i)hello. The i flag makes every character class and literal match both upper and lowercase without listing both explicitly.

Q: What is the difference between test() and match() in JavaScript regex?

test() returns a boolean (true/false) — use it when you only need to know if the pattern matches. match() returns an array of matches (or null) — use it when you need to extract the matched text. exec() is similar to match() but gives you more control and is required when using the global flag with named captures. For simple validation, test() is the most performant choice.

Q: What is ReDoS and which regex patterns cause it?

ReDoS (Regular Expression Denial of Service) occurs when nested quantifiers cause exponential backtracking on adversarial input. According to OWASP, ~20% of production regex patterns contain this vulnerability. Classic dangerous patterns: (a+)+, (a|aa)+, (.*a){x} with large x. Prevention: avoid nested quantifiers, use atomic groups when available, set execution timeouts, and use a tool like safe-regex to audit patterns.

Q: How do I match a literal dot (.) in regex?

Escape it with a backslash: \.. In a JavaScript regex literal: /\./ matches a literal dot. Without the backslash, . is a metacharacter that matches any character except newline. This is a frequent source of bugs in email and IP address patterns — for example, 192\.168\.1\.1 matches only the literal IP, while 192.168.1.1 would also match 192X168Y1Z1.

Q: Can I use regex to parse HTML or JSON?

No, and this is a strong rule. HTML and JSON are recursive, context-free languages — regex (a regular language formalism) cannot reliably parse them. Use a DOM parser for HTML (document.querySelector in browsers, cheerio in Node) and JSON.parse() for JSON. Regex on HTML breaks on nested tags, attributes with special characters, and CDATA sections. The famous Stack Overflow answer on this topic has over 6,000 upvotes agreeing.

Each pattern below includes the regex, language-specific usage in JavaScript and Python, matching examples, and a "gotcha" note — the edge case that catches developers off guard. Patterns are organized by category. Start with the one you need, read the gotcha, then adapt.

The patterns use standard PCRE/ECMAScript syntax unless noted. Test any pattern against your actual data with our Regex Testerbefore shipping to production. According to empirical research published at ASE'19 by Davis et al., 94% of developers re-use regex patterns — which means bugs in common patterns propagate widely.

// Quick usage reference:

// JavaScript — test (boolean):
/pattern/flags.test(string)

// JavaScript — extract matches:
string.match(/pattern/g)           // all matches, no capture groups
string.matchAll(/pattern/g)        // all matches with capture groups

// JavaScript — replace:
string.replace(/pattern/g, 'replacement')

// Python — test (boolean):
import re
bool(re.search(r'pattern', string))

// Python — extract first match:
re.search(r'pattern', string).group()

// Python — extract all matches:
re.findall(r'pattern', string)

1. Validation Patterns

Email Address

// Pattern:
/^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/

// JavaScript:
const emailRegex = /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/
emailRegex.test('[email protected]')       // true
emailRegex.test('[email protected]') // true
emailRegex.test('not-an-email')           // false

// Python:
import re
pattern = r'^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$'
bool(re.match(pattern, '[email protected]'))  # True

Gotcha: This passes [email protected]. For auth, always send a verification email. The W3C HTML spec deliberately uses a simplified pattern for the same reason — full RFC 5321 compliance is impractical.

URL (http/https)

// Pattern:
/^https?:\/\/[^\s/$.?#].[^\s]*$/

// JavaScript:
const urlRegex = /^https?:\/\/[^\s/$.?#].[^\s]*$/
urlRegex.test('https://bytepane.com/regex-tester/')  // true
urlRegex.test('http://localhost:3000')                // true
urlRegex.test('ftp://example.com')                   // false (no ftp)

// For a permissive check, use the URL constructor instead:
function isValidUrl(str) {
  try {
    const url = new URL(str)
    return url.protocol === 'http:' || url.protocol === 'https:'
  } catch {
    return false
  }
}

Gotcha: For URL validation in JS, the new URL() constructor is more reliable than regex — it handles edge cases like IDNs and IPv6 addresses.

Phone Number (E.164 International)

// E.164 format (+15551234567):
/^\+[1-9]\d{7,14}$/

// US format only (accepts multiple formats):
/^(\+1)?[\s.-]?\(?[2-9]\d{2}\)?[\s.-]?\d{3}[\s.-]?\d{4}$/

// Examples that match US pattern:
// +1 (555) 123-4567
// 555.123.4567
// 5551234567
// (555) 123-4567

Gotcha: Phone number formats differ by country. E.164 is the safest universal format. For user-facing inputs, normalize to E.164 server-side using a library like libphonenumber-js rather than validating arbitrary formats with regex.

IPv4 Address

// Pattern (validates 0-255 per octet):
/^((25[0-5]|2[0-4]\d|1\d{2}|[1-9]\d|\d)\.){3}(25[0-5]|2[0-4]\d|1\d{2}|[1-9]\d|\d)$/

// Matches:
// 192.168.1.1    ✓
// 0.0.0.0        ✓
// 255.255.255.0  ✓
// 999.0.0.1      ✗ (octet > 255)
// 192.168.1      ✗ (incomplete)

Gotcha: Simple patterns like /(\d{1,3}\.){3}\d{1,3}/ accept 999.999.999.999. Always validate the 0 to 255 range per octet.

IPv6 Address

// Full and compressed IPv6:
/^(([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|(([0-9a-fA-F]{1,4}:){1,5}|:)(:[0-9a-fA-F]{1,4}){1,2}|::1|::)$/

// Matches:
// 2001:db8::1              ✓
// ::1                      ✓ (loopback)
// fe80::1%eth0             ✗ (zone IDs not covered)

// Practical note: use your platform's built-in validation:
// Python: import ipaddress; ipaddress.ip_address(s)
// Node.js: require('net').isIPv6(s)

Gotcha: IPv6 regex is notoriously complex. Use net.isIPv6() in Node.js or Python's ipaddress module — they handle all RFC 4291 forms correctly.

Credit Card Number (Luhn-format)

// Format check only (13–19 digits, optional spaces/dashes):
/^[0-9]{4}([\s-]?[0-9]{4}){3}$/

// Visa (starts with 4, 13-16 digits):
/^4[0-9]{12}(?:[0-9]{3})?$/

// Mastercard (starts with 51-55 or 2221-2720):
/^5[1-5][0-9]{14}|^(222[1-9]|22[3-9]\d|2[3-6]\d{2}|27[01]\d|2720)[0-9]{12}$/

// Amex (starts with 34 or 37, 15 digits):
/^3[47][0-9]{13}$/

Gotcha: Regex only checks format. Always run a Luhn checksum to verify the number is structurally valid. For live cards, only a payment processor can confirm the account exists.

Strong Password

// Requires: 8+ chars, uppercase, lowercase, digit, special char
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&\-_#])[A-Za-z\d@$!%*?&\-_#]{8,}$/

// How lookaheads work here:
// (?=.*[a-z])      — must contain at least one lowercase
// (?=.*[A-Z])      — must contain at least one uppercase
// (?=.*\d)         — must contain at least one digit
// (?=.*[...])      — must contain at least one special character
// [A-Za-z...]{8,}  — total length 8+, only allowed chars

// Matches:
// MyP@ssw0rd!  ✓
// weakpass     ✗ (no uppercase/digit/special)

Gotcha: NIST SP 800-63B (2025 update) recommends checking passwords against breach databases (HIBP API) over enforcing composition rules. Complexity rules drive users to predictable patterns like Password1!.

UUID v4

// UUID v4 (random):
/^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i

// Note the version bit: 4[0-9a-f]{3} (3rd group starts with 4)
// And variant bit: [89ab] (4th group starts with 8, 9, a, or b)

// Matches:
// 550e8400-e29b-41d4-a716-446655440000  ✓
// 550e8400-e29b-41d4-c716-446655440000  ✗ (invalid variant bit)
// 550e8400e29b41d4a716446655440000      ✗ (no hyphens)

Gotcha: If you only need to check "is this a UUID-shaped string," the simpler /^[0-9a-f-]36$/i works. The full pattern above validates UUID v4 specifically.

2. Data Extraction Patterns

Hex Color Code

// 3 or 6 digit hex with optional alpha:
/^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}|[A-Fa-f0-9]{8}|[A-Fa-f0-9]{4})$/

// Matches:
// #fff       ✓ (3-digit)
// #FF5733    ✓ (6-digit)
// #FF573380  ✓ (8-digit with alpha)
// #xyz       ✗

// Extract all hex colors from a CSS file:
const css = 'color: #ff5733; background: #333;'
const colors = css.match(/#[A-Fa-f0-9]{3,8}/g)  // ["#ff5733", "#333"]

Gotcha: CSS also accepts rgb(), hsl(), and named colors. This pattern only catches hex notation. For full CSS color extraction, consider a CSS parser. Convert between formats with our Hex to RGB converter.

Date (YYYY-MM-DD / ISO 8601)

// ISO 8601 date:
/^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/

// US format MM/DD/YYYY:
/^(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/\d{4}$/

// Extract dates from a string:
const text = 'Created 2026-04-22, updated 2026-04-24'
const dates = text.match(/\d{4}-\d{2}-\d{2}/g)  // ["2026-04-22", "2026-04-24"]

// Full ISO 8601 datetime:
/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:\d{2})?$/

Gotcha: Regex cannot validate calendar logic — 2026-02-31 passes the pattern. Always parse with new Date(str) or datetime.strptime() and check for Invalid Date.

Time (HH:MM and HH:MM:SS)

// 24-hour time HH:MM:
/^([01]\d|2[0-3]):[0-5]\d$/

// 24-hour HH:MM:SS:
/^([01]\d|2[0-3]):[0-5]\d:[0-5]\d$/

// 12-hour with AM/PM:
/^(0?[1-9]|1[0-2]):[0-5]\d\s?(AM|PM|am|pm)$/

// Matches:
// 23:59       ✓
// 00:00:00    ✓
// 25:00       ✗
// 12:30 PM    ✓

Gotcha: Timezone offsets (+05:30, Z) need additional handling. For parsing, new Date('1970-01-01T' + timeStr) is safer in JavaScript.

HTML Tags (Extract or Strip)

// Strip all HTML tags (use with extreme caution):
/(<([^>]+)>)/gi

// Extract src attributes from img tags:
/<img[^>]+src=["']([^"']+)["']/gi

// Extract href from anchor tags:
/<a[^>]+href=["']([^"']+)["']/gi

// JavaScript — strip HTML tags:
function stripHtml(html) {
  return html.replace(/(<([^>]+)>)/gi, '')
}

Gotcha: Regex cannot parse HTML — it breaks on nested tags, attributes with > in values, and malformed markup. For HTML parsing in Node.js use cheerio; in browsers use DOMParser. Use the above patterns only for simple, controlled HTML strings.

Extract Numbers from String

// All integers:
/-?\d+/g

// All floats (including negatives):
/-?\d+\.?\d*/g

// Currency amounts ($1,234.56):
/\$[\d,]+\.?\d*/g

// JavaScript example:
const text = 'Order of 3 items for $42.99, shipped in 2 days'
const numbers = text.match(/-?\d+\.?\d*/g)  // ["3", "42.99", "2"]
const currency = text.match(/\$[\d,.]+/g)    // ["$42.99"]

Gotcha: These patterns match numbers inside larger strings — version2.1 yields 2.1. Add word boundary assertions (\b) if you only want standalone numbers.

Hashtags

// Extract hashtags from social media text:
/#[\w\u0080-\uFFFF]+/g

// JavaScript:
const post = 'Building cool tools #webdev #regex #bytepane'
const tags = post.match(/#[\w]+/g)  // ["#webdev", "#regex", "#bytepane"]

// Python:
import re
tags = re.findall(r'#\w+', post)  # ['#webdev', '#regex', '#bytepane']

Gotcha: The \u0080-\uFFFF range enables matching Unicode hashtags (#日本語). Twitter's actual hashtag algorithm is more complex — it excludes purely numeric tags and has length limits by language.

3. String Transformation Patterns

URL Slug (sanitize for SEO)

// Validate a URL slug (lowercase, letters, digits, hyphens only):
/^[a-z0-9]+(?:-[a-z0-9]+)*$/

// Generate a slug from a title:
function slugify(title) {
  return title
    .toLowerCase()
    .trim()
    .replace(/[^\w\s-]/g, '')     // remove non-word chars except spaces/hyphens
    .replace(/[\s_]+/g, '-')      // spaces and underscores → hyphens
    .replace(/^-+|-+$/g, '')      // trim leading/trailing hyphens
    .replace(/-{2,}/g, '-')       // collapse multiple hyphens
}

slugify('Hello, World! 2026')  // "hello-world-2026"

Gotcha: This strips accented characters like é to nothing. For proper Unicode slugification, normalize with str.normalize('NFKD') first to decompose accented characters, then strip combining marks.

camelCase to snake_case

// JavaScript:
function camelToSnake(str) {
  return str
    .replace(/([A-Z])/g, '_$1')         // insert _ before uppercase
    .replace(/^_/, '')                   // remove leading underscore
    .toLowerCase()
}

camelToSnake('myVariableName')  // "my_variable_name"
camelToSnake('XMLParser')       // "x_m_l_parser" ← gotcha with acronyms

// Better version that handles consecutive capitals (acronyms):
function camelToSnakeSmart(str) {
  return str
    .replace(/([A-Z]+)([A-Z][a-z])/g, '$1_$2')  // XMLParser → XML_Parser
    .replace(/([a-z])([A-Z])/g, '$1_$2')          // camelCase → camel_Case
    .toLowerCase()
}

camelToSnakeSmart('XMLParser')         // "xml_parser"
camelToSnakeSmart('myVariableName')    // "my_variable_name"

Trim Excess Whitespace

// Collapse multiple spaces to one:
str.replace(/\s{2,}/g, ' ').trim()

// Remove leading/trailing whitespace on each line:
str.replace(/^\s+|\s+$/gm, '')

// Normalize all whitespace (newlines, tabs → single space):
str.replace(/\s+/g, ' ').trim()

// Remove blank lines:
str.replace(/^\s*[\r\n]/gm, '')

Gotcha: \s matches all Unicode whitespace including non-breaking spaces (\u00A0), which can be useful — or surprising depending on context.

Escape HTML Special Characters

function escapeHtml(str) {
  return str.replace(/[&<>"']/g, (char) => ({
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    '"': '&quot;',
    "'": '&#39;',
  }[char]))
}

escapeHtml('<script>alert("xss")</script>')
// → "&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;"

// This is the minimal XSS prevention pattern for injecting
// user content into HTML — React does this automatically

Gotcha: This only prevents HTML injection in text content. For attributes, URLs, JavaScript context, and CSS context, different escaping rules apply. Never build a full security policy on this one function alone — use a library like DOMPurify for untrusted HTML.

4. Log Parsing & Developer Patterns

Apache / Nginx Access Log Line

// Apache Combined Log Format:
/^(\S+) \S+ \S+ \[([^\]]+)\] "(\S+) (\S+) (\S+)" (\d{3}) (\d+|-) "([^"]*)" "([^"]*)"$/

// Named groups (Python):
import re
pattern = re.compile(
  r'(?P<ip>\S+) \S+ \S+ \[(?P<time>[^\]]+)\] '
  r'"(?P<method>\S+) (?P<path>\S+) \S+" '
  r'(?P<status>\d{3}) (?P<bytes>\d+|-)'
)
match = pattern.match(log_line)
if match:
    print(match.group('ip'), match.group('status'))

Semantic Version (semver)

// Full semver per semver.org spec:
/^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$/

// Simpler version for common cases:
/^\d+\.\d+\.\d+(-[a-zA-Z0-9.]+)?(\+[a-zA-Z0-9.]+)?$/

// Matches:
// 1.2.3              ✓
// 2.0.0-beta.1       ✓
// 1.0.0+build.123    ✓
// 1.2               ✗ (missing patch)

Git Commit Hash (Short and Full)

// Full SHA-1 (40 hex chars):
/^[0-9a-f]{40}$/i

// Short hash (7-12 chars as shown by git log --abbrev-commit):
/^[0-9a-f]{7,12}$/i

// Extract commit hashes from git log output:
const gitLog = 'abc1234 Fix auth bug\n9f3e012 Add regex examples'
const hashes = gitLog.match(/^[0-9a-f]{7}/gm)  // ["abc1234", "9f3e012"]

JSON Key-Value Pair (Simple)

// Extract string key-value pairs:
/"([^"]+)":\s*"([^"]+)"/g

// JavaScript:
const json = '{"name":"Alice","role":"admin","email":"[email protected]"}'
const pairs = [...json.matchAll(/"([^"]+)":\s*"([^"]+)"/g)]
// [["...","name","Alice"], ["...","role","admin"], ...]

// ⚠️ Use JSON.parse() for real JSON parsing.
// This pattern is only for quick extraction from known-format strings.

Gotcha: This pattern breaks on escaped quotes inside values, numbers, arrays, and nested objects. Always use JSON.parse() for JSON. Our JSON Formatter is useful for inspecting complex JSON.

Environment Variable Line (.env format)

// Parse KEY=VALUE lines (with optional quotes):
/^([A-Z_][A-Z0-9_]*)=["']?([^"'\n]*)["']?$/gm

// Python example — parse a .env file:
import re
env_pattern = re.compile(r'^([A-Z_][A-Z0-9_]*)=["']?([^"'\n]*)["']?$', re.MULTILINE)
env_vars = dict(env_pattern.findall(env_content))

Gotcha: This doesn't handle multiline values, comments, or shell variable expansion. Use dotenv (Node) or python-dotenv for production .env parsing.

Quick Reference: All 30 Patterns

#	Pattern Name	Category	Use Case
1	Email address	Validation	Auth forms, newsletter signup
2	URL (http/https)	Validation	Link validation, web scraping
3	Phone (E.164)	Validation	International phone input
4	IPv4 address	Validation	Network config, server logs
5	IPv6 address	Validation	Modern network validation
6	Credit card	Validation	Payment form pre-validation
7	Strong password	Validation	Registration password check
8	UUID v4	Validation	ID format validation
9	Hex color	Extraction	CSS parsing, design tools
10	Date ISO 8601	Extraction	Log parsing, data ETL
11	Time HH:MM	Extraction	Schedule parsing
12	HTML tags	Extraction	Content scraping, stripping
13	Numbers from string	Extraction	Data extraction, ETL
14	Hashtags	Extraction	Social media processing
15	URLs in text	Extraction	Link detection in content
16	URL slug	Transformation	SEO URL validation
17	camelCase→snake_case	Transformation	Code generation, API normalization
18	Trim whitespace	Transformation	Input sanitization
19	Escape HTML	Transformation	XSS prevention
20	Markdown links	Extraction	Docs processing
21	Apache log line	Log Parsing	Server analytics
22	Semver	Dev	Package version validation
23	Git commit hash	Dev	CI/CD pipeline scripts
24	JSON key-value	Dev	Quick config extraction
25	.env variables	Dev	Config file parsing
26	Markdown code fence	Extraction	Docs tooling
27	SQL SELECT query	Extraction	Query analysis, logging
28	CSS class names	Extraction	Static analysis, tooling
29	Base64 string	Validation	Token/data detection
30	ANSI escape codes	Transformation	Terminal output stripping

5. Additional Patterns (21–30)

Markdown Links (Extract)

// Extract [text](url) links from Markdown:
/\[([^\]]+)\]\(([^)]+)\)/g

// JavaScript:
const md = 'Check out [BytePane](https://bytepane.com) for dev tools'
const links = [...md.matchAll(/\[([^\]]+)\]\(([^)]+)\)/g)]
// links[0][1] = "BytePane", links[0][2] = "https://bytepane.com"

CSS Class Names (Extract from HTML)

// Extract class attribute values from HTML:
/class=["']([^"']+)["']/gi

// Extract individual Tailwind/CSS class names:
const classes = 'class="bg-dark-surface border border-purple rounded-xl"'
const allClasses = classes
  .match(/class=["']([^"']+)["']/i)[1]
  .split(/\s+/)
// ["bg-dark-surface", "border", "border-purple", "rounded-xl"]

Base64 String Detection

// Detect a valid Base64-encoded string:
/^[A-Za-z0-9+/]+={0,2}$/

// Detect Base64Url (JWT-safe, no +/=):
/^[A-Za-z0-9_-]+=*$/

// Minimum length check (Base64 is always a multiple of 4 when padded):
function isBase64(str) {
  return /^[A-Za-z0-9+/]+={0,2}$/.test(str) && str.length % 4 === 0
}

ANSI Escape Codes (Strip Terminal Colors)

// Strip ANSI escape sequences from terminal output:
/\x1B\[[0-9;]*m/g

// JavaScript:
const colored = '\x1b[32mSuccess\x1b[0m: Tests passed'
const plain = colored.replace(/\x1B\[[0-9;]*m/g, '')
// "Success: Tests passed"

// Extended version covering all escape sequences:
/[\u001B\u009B][[\]()#;?]*(?:(?:(?:[a-zA-Z\d]*(?:;[-a-zA-Z\d\/#&.:=?%@~_]*)*)?\u0007)|(?:(?:\d{1,4}(?:;\d{0,4})*)?[\dA-PR-TZcf-ntqry=><~]))/g

Gotcha: Terminal output can include non-color escape sequences (cursor movement, clearing). The extended pattern above covers more cases but is more expensive. The strip-ansi npm package is the battle-tested solution for Node.js.

Performance: Regex vs String Methods

Regex is not always the right tool. For simple cases, string methods are faster, more readable, and less error-prone:

Task	Regex	String Method	Prefer
Check prefix	/^https/.test(s)	s.startsWith('https')	String
Check suffix	/\\.pdf$/.test(s)	s.endsWith('.pdf')	String
Simple contains	/error/.test(s)	s.includes('error')	String
Split by char	s.split(/,/)	s.split(',')	String
Complex validation	/^[\w.]+@[\w]+\.\w+$/	No equivalent	Regex
Extract all matches	s.matchAll(/\d+/g)	No equivalent	Regex

Frequently Asked Questions

What is the best regex for validating email addresses?

The simplified pattern /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/ catches 99%+ of real-world addresses. Full RFC 5321 compliance requires a 6,000+ character pattern that is impractical. For auth flows, always confirm with a verification email regardless of what regex passes.

How do I make a regex case-insensitive?

Add the i flag. JavaScript: /pattern/i. Python: re.compile("pattern", re.IGNORECASE). Go: start the pattern with (?i). The i flag makes every literal character and character class match upper and lowercase without listing both explicitly.

What is the difference between test() and match() in JavaScript?

test() returns a boolean — use it for validation checks. match() returns an array of matched strings (or null) — use it when you need the matched text. For multiple matches with capture groups, use matchAll(). For performance-critical code, test() is slightly faster than match().

Why does my regex match too much? How do I fix greedy quantifiers?

Add ? to make quantifiers lazy. *? and +? match as few characters as possible. Given "<a><b>", <.*> matches the entire string (greedy), while <.*?> matches only "<a>" (lazy). Anchors and character class negation ([^<]*) are often a cleaner solution than lazy quantifiers.

What is ReDoS and how do I prevent it?

ReDoS causes exponential backtracking through patterns like (a+)+. An adversarial input like "aaaaaaaaX" forces the engine to try every combination. Prevention: avoid nested quantifiers, use atomic groups when available, set execution timeouts in user-facing contexts, and audit patterns with the safe-regex npm package or RXXR2 tool.

Can I use regex to parse HTML or JSON?

No for structured parsing. Regex cannot parse recursive or context-sensitive grammars reliably. Use DOMParser or cheerio for HTML; JSON.parse() for JSON. Regex is fine for simple extractions from controlled, known-format strings — but never as a general HTML/JSON parser.

How do I match a literal dot in regex?

Escape it: \. matches a literal dot. Without the backslash, . matches any character except newline. This is a common bug in IP address and email patterns — 192.168.1.1 as a pattern also matches 192X168Y1Z1. Use 192\.168\.1\.1 for an exact IP address match.

Test Any Pattern Instantly

Paste a regex and test string into our live tester — see matches highlighted in real time, with match groups and index positions.

Open Regex Tester →

Key Takeaways