BytePane

Regex Email Validation: The Right Pattern for Email Matching

Regex16 min read

Key Takeaways

  • The popular /^\w+@\w+\.\w+$/ regex rejects valid addresses — plus-addressing, subdomain emails, and newer TLDs all fail it.
  • Full RFC 5322 compliance is a trap — the spec-compliant regex is 6,000+ characters and permits constructs no real system needs to accept.
  • Regex catches ~5% of email problems — format validation alone cannot verify deliverability; only a verification email can.
  • ReDoS is a real risk — many published email regex patterns have catastrophic backtracking. Test with 'a'.repeat(50) + '!' before deploying.
  • Layered validation wins — regex for format, MX lookup for domain, verification email for mailbox existence.

Here is the email validation regex you will find in the top ten Stack Overflow answers, countless tutorials, and probably already in your codebase:

/^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$/

It looks reasonable. It is not. This pattern rejects [email protected] (plus-addressing used by Gmail, Fastmail, and tens of millions of users), fails for addresses on newer long TLDs like .international, and mishandles hyphenated domains because \w does not match hyphens in the right context.

According to a 2026 benchmark by email validation researcher Jonas Neubert, the most widely cited "simple" email patterns reject between 2% and 8% of syntactically valid real-world email addresses. That is a meaningful false rejection rate — one that silently blocks users from completing signups.

Meanwhile, the theoretically correct solution — the full RFC 5322-compliant regex — is a 6,000-character monster that accepts "john doe"@[IPv6:2001:db8::1] as valid. Which it is, per the spec. But your application almost certainly should not.

This guide cuts through both failure modes: gives you the right practical pattern, explains what the spec actually says, covers ReDoS risks, and outlines the layered validation strategy that actually works in production.

What RFC 5321 and RFC 5322 Actually Permit

There are two relevant RFCs and developers frequently conflate them. RFC 5321 (SMTP, October 2008) defines the wire format for email in transit — what an SMTP server accepts. RFC 5322 (Internet Message Format, October 2008, updated by RFC 6854) defines the format of email message headers. Both allow things that will surprise you.

An email address has two parts separated by @: the local part (before @) and the domain (after @).

Legal local part forms (RFC 5321)

# All of these are valid per RFC 5321:

[email protected]          # standard
[email protected]      # plus addressing — extremely common
[email protected]     # dots
[email protected]     # hyphens
[email protected]     # underscores
"user name"@example.com   # quoted string with space (!)
"user@name"@example.com   # quoted string with @ symbol (!)
user!#$%&'*+/=?^_{|}~@x  # special chars allowed in unquoted local
admin@[192.168.1.1]       # IP address domain literal
admin@[IPv6:2001:db8::1]  # IPv6 domain literal

# Local part rules (unquoted):
# - Allowed: a-z A-Z 0-9 . ! # $ % & ' * + / = ? ^ _ ` { | } ~
# - NOT allowed: ( ) , : ; < > @ [ \ ]
# - Cannot start or end with a dot
# - Max 64 characters
# - Consecutive dots (..) are forbidden

# Domain rules:
# - Labels separated by dots
# - Each label: 1-63 chars, letters/digits/hyphens
# - Cannot start or end with a hyphen
# - Max 255 characters total
# - TLD cannot be all-numeric

The key insight: most apps should NOT support the full spec

Quoted strings, IP literals, and special character local parts are technically valid. They are also: accepted by fewer than 0.01% of real email providers, a source of injection risks if your system doesn't properly escape them, and never actually used by your actual users. As the regular-expressions.info guide (the authoritative reference for practical regex, maintained since 2003) states: "A more practical implementation of RFC 5322 that omits IP addresses and obsolete syntax will still match 99.99% of all email addresses in actual use today."

The Practical Email Validation Spectrum

There is no single "correct" email regex — the right pattern depends on your tolerance for false positives vs false negatives. Here are the four tiers, from loosest to strictest:

TierPatternRejectsUse When
Sanity check/^[^\s@]+@[^\s@]+\.[^\s@]+$/Strings without @ or dotYou rely on verification email
Practical/^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/Quoted strings, IP literalsMost web apps — covers 99.9%
Strict practicalSee belowLeading/trailing dots, double dotsHigh-trust systems
RFC 5322 full6,000+ charsAlmost nothingTheoretical only — don't ship it
// TIER 1: Sanity check — accepts nearly everything with an @ and a dot
// Use when: you send a verification email and trust users to correct typos
const SANITY_EMAIL = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;

SANITY_EMAIL.test('[email protected]');      // true
SANITY_EMAIL.test('[email protected]');    // true
SANITY_EMAIL.test('not-an-email');          // false
SANITY_EMAIL.test('missing@tld');           // false (no dot after @)

// TIER 2: Practical — the pattern to use in most applications
// Covers 99.9% of real email addresses used by real people
const PRACTICAL_EMAIL = /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/;

PRACTICAL_EMAIL.test('[email protected]');       // true
PRACTICAL_EMAIL.test('[email protected]');   // true  (plus addressing ✓)
PRACTICAL_EMAIL.test('[email protected]');    // true  (subdomains ✓)
PRACTICAL_EMAIL.test('[email protected]');      // true  (hyphened domain ✓)
PRACTICAL_EMAIL.test('[email protected]'); // true  (punycode IDN ✓)
PRACTICAL_EMAIL.test('"user name"@domain.com'); // false (quoted strings not needed)
PRACTICAL_EMAIL.test('user @example.com');      // false (space ✓)
PRACTICAL_EMAIL.test('@example.com');           // false (empty local ✓)

// TIER 3: Strict practical — adds structural constraints
// Prevents .., leading/trailing dots in local part
const STRICT_EMAIL = /^(?!.*\.{2})[a-zA-Z0-9]([a-zA-Z0-9._%+\-]{0,62}[a-zA-Z0-9])?@[a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?)*\.[a-zA-Z]{2,}$/;

STRICT_EMAIL.test('[email protected]');     // false (leading dot ✓)
STRICT_EMAIL.test('[email protected]');     // false (trailing dot ✓)
STRICT_EMAIL.test('[email protected]');    // false (consecutive dots ✓)
STRICT_EMAIL.test('[email protected]');     // false (domain starts with hyphen ✓)

ReDoS: The Email Regex That Will Crash Your Server

Before deploying any email validation regex on a server, test it for catastrophic backtracking. Many of the most-copied email patterns are ReDoS vulnerabilities. According to OWASP's ReDoS Cheat Sheet, string inputs with repeated near-matching sequences can cause exponential regex execution time, allowing a single HTTP request to stall a Node.js event loop for seconds.

// DANGEROUS: many popular email patterns have this structure
// The nested quantifier creates O(2^n) backtracking
const VULNERABLE = /^([a-zA-Z0-9]+\.?)+@([a-zA-Z0-9]+\.?)+\.[a-zA-Z]{2,}$/;

// Attack input: non-matching string with many repeated characters
const attack = 'a'.repeat(40) + '!';
console.time('attack');
VULNERABLE.test(attack);  // BLOCKS event loop for multiple seconds
console.timeEnd('attack');

// TEST your regex before deploying:
function testForReDoS(regex, length = 40) {
  const attack = 'a'.repeat(length) + '!';
  const start = Date.now();
  regex.test(attack);
  const elapsed = Date.now() - start;
  if (elapsed > 50) {
    console.warn(`Potential ReDoS: ${elapsed}ms for length ${length}`);
    return false;
  }
  return true;
}

testForReDoS(VULNERABLE);       // false — dangerous
testForReDoS(PRACTICAL_EMAIL);  // true  — safe

// THE FIX: restructure to eliminate quantifier nesting
// PRACTICAL_EMAIL avoids nesting: [a-zA-Z0-9._%+\-]+ is flat, not (group)+
// Run the same attack on PRACTICAL_EMAIL: <1ms

// For untrusted user-supplied patterns, use node-re2 (Google RE2 engine):
// npm install re2
// const RE2 = require('re2');
// const safeRe = new RE2(pattern);  // Linear time, no ReDoS possible

The validator.js npm package (4.8M+ weekly downloads per npm registry, May 2026) ships a ReDoS-safe isEmail() function that has been audited against pathological inputs. If you are not writing your own validation logic for a specific reason, it is the safer default for server-side Node.js code.

Implementing Email Validation in JavaScript, Python, and Go

JavaScript / TypeScript

// Option 1: Inline regex (practical tier)
const EMAIL_RE = /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/;

function isValidEmailFormat(email: string): boolean {
  if (typeof email !== 'string') return false;
  if (email.length > 254) return false;  // RFC 5321 max total length
  const [local] = email.split('@');
  if (local && local.length > 64) return false;  // RFC 5321 local part max
  return EMAIL_RE.test(email);
}

// Option 2: validator.js (recommended for server-side)
// npm install validator
import validator from 'validator';
validator.isEmail('[email protected]');              // true
validator.isEmail('[email protected]', {
  allow_utf8_local_part: false,  // reject Unicode local parts
  require_tld: true,
  allow_ip_domain: false,
});

// Option 3: HTML native validation (client-side)
// The browser's type="email" uses the WHATWG algorithm — very practical
// <input type="email" required />
// Access programmatically:
const input = document.querySelector('input[type="email"]') as HTMLInputElement;
input.checkValidity();  // uses WHATWG's built-in algorithm

// Option 4: Zod (TypeScript schema validation)
import { z } from 'zod';
const EmailSchema = z.string().email();  // uses validator.js internally
EmailSchema.safeParse('[email protected]');  // { success: true }
EmailSchema.safeParse('not-email');         // { success: false, error: ZodError }

// Complete validation with length guards:
function validateEmail(raw: unknown): { valid: boolean; normalized?: string; error?: string } {
  if (typeof raw !== 'string') return { valid: false, error: 'Must be a string' };
  const email = raw.trim().toLowerCase();
  if (email.length === 0) return { valid: false, error: 'Required' };
  if (email.length > 254) return { valid: false, error: 'Too long' };
  if (!EMAIL_RE.test(email)) return { valid: false, error: 'Invalid format' };
  return { valid: true, normalized: email };
}

Python

import re

# Compiled at module level — re.compile() is memoized in CPython,
# but explicit compile makes the intent clear in library code
EMAIL_RE = re.compile(
    r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$'
)

def is_valid_email_format(email: str) -> bool:
    if not isinstance(email, str):
        return False
    email = email.strip()
    if len(email) > 254:
        return False
    parts = email.split('@')
    if len(parts) != 2 or len(parts[0]) > 64:
        return False
    return bool(EMAIL_RE.match(email))

# For production use, email-validator is the de facto standard:
# pip install email-validator
from email_validator import validate_email, EmailNotValidError

try:
    emailinfo = validate_email("[email protected]", check_deliverability=False)
    normalized = emailinfo.normalized  # '[email protected]'
except EmailNotValidError as e:
    print(str(e))

# With MX record check (requires DNS lookup, adds latency):
try:
    emailinfo = validate_email("[email protected]", check_deliverability=True)
    # Raises EmailNotValidError if domain has no MX records
except EmailNotValidError as e:
    print(f"Undeliverable: {e}")

Go

package emailvalidation

import (
    "net"
    "regexp"
    "strings"
)

var emailRE = regexp.MustCompile(
    `^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$`,
)

// IsValidEmailFormat performs format-only validation.
// For deliverability, use CheckMX after this.
func IsValidEmailFormat(email string) bool {
    email = strings.TrimSpace(strings.ToLower(email))
    if len(email) > 254 {
        return false
    }
    parts := strings.SplitN(email, "@", 2)
    if len(parts) != 2 || len(parts[0]) > 64 {
        return false
    }
    return emailRE.MatchString(email)
}

// CheckMX validates that the domain has MX records.
// Adds network latency — use async or cache results per domain.
func CheckMX(email string) (bool, error) {
    parts := strings.SplitN(email, "@", 2)
    if len(parts) != 2 {
        return false, nil
    }
    mx, err := net.LookupMX(parts[1])
    if err != nil || len(mx) == 0 {
        return false, err
    }
    return true, nil
}

// For the go-playground/validator library (most popular Go validation lib):
// import "github.com/go-playground/validator/v10"
// validate := validator.New()
// err := validate.Var("[email protected]", "required,email")

What type="email" Actually Does

The WHATWG HTML Living Standard defines a specific algorithm for <input type="email"> that deliberately diverges from RFC 5322. The spec describes this explicitly: "This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict... and too lax."

The WHATWG algorithm (simplified):

// What browsers actually use for type="email" (simplified from WHATWG spec):
// https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address

const WHATWG_EMAIL = /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;

// Key differences from RFC 5322:
// ✓ Allows special chars in local part (!#$%&'*+/=?^_`{|}~)
// ✗ Rejects quoted strings ("john doe"@example.com)
// ✗ Rejects IP domain literals ([192.168.1.1])
// ✓ No max-length enforcement (that's a separate attribute)
// ✓ Does NOT require a TLD with 2+ chars — "user@localhost" is valid!

// The "no TLD required" part surprises developers.
// If you need to reject "user@localhost" in a web form,
// use pattern="[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}"
// alongside type="email"
// <input type="email" pattern="..." required />

The Layered Validation Strategy

Regex catches format errors. It cannot catch: mistyped domains (gmial.com), disposable email services, non-existent mailboxes, or full inboxes. An effective email validation pipeline has three layers.

LayerMethodCatchesLatencyCost
1. FormatRegexSyntax errors, missing @, bad TLD<1msFree
2. DomainMX record lookupNon-existent domains, typos (gmial.com)50-200msFree (DNS)
3. MailboxVerification emailNon-existent mailboxes, disposable addressesMinutesEmail cost
OptionalSMTP probe APIDisposable, catch-all vs real mailbox1-5s$$/10k

For most applications, layers 1 and 3 are sufficient — format check on input, verification email before granting access. Layer 2 (MX lookup) is a useful middle ground for high-value forms where you want to catch gmial.com-style typos before sending emails that bounce. A 2026 analysis by Prospeo (an email verification service) found that format validation alone catches approximately 5% of invalid addresses submitted in web forms; domain and mailbox checks catch the remaining 95%.

Use BytePane's Regex Validation Patterns guide for production-ready patterns for URLs, phone numbers, and IP addresses alongside email — all tested for ReDoS safety.

International Email Addresses (IDN and Unicode)

Since RFC 6530 (2012) and RFC 6531 (Internationalized Email), email addresses can include Unicode characters in both the local part and domain. 用户@例子.广告 is a valid Chinese email address. How you handle this depends on your system.

// Unicode email addresses (RFC 6531 — Internationalized Email)
// Most mail providers accept ACE-encoded domains (Punycode) at SMTP layer
// but may support Unicode local parts via SMTPUTF8 extension

// Valid international addresses:
// 用户@例子.广告   (Chinese)
// пользователь@example.com  (Cyrillic local part)
// user@münchen.de           (IDN domain — stored as xn--mnchen-3ya.de)

// Option 1: Accept punycode domains, reject Unicode local parts
// (safe for systems without SMTPUTF8 SMTP support)
const ASCII_LOCAL_UNICODE_DOMAIN = /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/;
// Works because IDN domains are stored/transmitted as xn-- punycode

// Option 2: Accept Unicode local parts too (requires SMTPUTF8-aware SMTP)
// JavaScript with /u flag for Unicode property escapes
const UNICODE_EMAIL = /^[\p{L}\p{N}._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/u;
UNICODE_EMAIL.test('пользователь@example.com');  // true
UNICODE_EMAIL.test('用户@example.com');           // true

// Normalize before storing: use .toLowerCase() for ASCII,
// but Unicode emails need NFKC normalization
const normalized = email.normalize('NFKC').toLowerCase();

// Reality check: as of 2026, fewer than 1% of email addresses
// in active use have Unicode local parts (per ICANN statistics).
// Most apps can safely accept ASCII-only local parts.

Common Mistakes and How to Avoid Them

// MISTAKE 1: Using \w instead of explicit character classes
// \w = [a-zA-Z0-9_] — misses + - . and rejects [email protected]
const WRONG1 = /^\w+@\w+\.\w+$/;
WRONG1.test('[email protected]');  // false — valid address rejected!

// MISTAKE 2: Forgetting the ^ and $ anchors
const WRONG2 = /[\w.]+@[\w.]+/;
WRONG2.test('not@@@@@@valid');  // true — partial match succeeds!

// MISTAKE 3: TLD length restriction too short
const WRONG3 = /^[\w.]+@[\w.]+\.[a-z]{2,4}$/i;
WRONG3.test('[email protected]');  // false — valid TLD rejected!
// Solution: use {2,} not {2,4}

// MISTAKE 4: Forgetting to normalize case before storage
function saveEmail_WRONG(email) {
  // '[email protected]' and '[email protected]' are the same address
  // (per RFC 5321, domain part is case-insensitive; local part MAY be)
  // In practice ALL major providers treat local parts as case-insensitive
  db.save({ email });  // Wrong — storing un-normalized case
}
function saveEmail_CORRECT(email) {
  db.save({ email: email.trim().toLowerCase() });  // Normalize first
}

// MISTAKE 5: Not trimming whitespace
// Users copy-paste emails with trailing spaces constantly
// Always: email.trim() before validation and storage

// MISTAKE 6: Rejecting subaddresses (plus addressing)
// [email protected] and [email protected] both deliver
// to the same inbox. Never strip or reject plus addressing —
// it's used extensively for filtering and disposable addresses.

// MISTAKE 7: Single-validating at one layer only
// Client-side validation only = easily bypassed
// Server-side only = poor UX
// Always validate at BOTH layers with the SAME pattern

Frequently Asked Questions

What is the best regex for email validation?

For most applications: /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/ covers 99.9% of real-world email addresses. It accepts plus-addressing, subdomains, and long TLDs while rejecting quoted strings and IP literals that no real user will have. Pair it with a length check (max 254 characters total, 64 for local part) for full compliance.

Why is full RFC 5322 email validation impractical with regex?

The RFC 5322 compliant regex is over 6,000 characters and permits constructs like quoted strings ("john doe"@example.com), IP domains ([192.168.1.1]), and obsolete folding whitespace. It's a theoretical exercise — almost no production system should accept all valid RFC 5322 forms, and doing so increases attack surface with zero user benefit.

Does email validation regex check if an email address exists?

No. Regex only validates format — it cannot check domain MX records, mailbox existence, or inbox capacity. Per a 2026 analysis by Prospeo, format validation catches about 5% of invalid emails. The remaining 95% require MX record lookup (for typo detection) or sending a verification email (for mailbox confirmation).

Should I validate email on the frontend or backend?

Both — for different reasons. Frontend (HTML type="email" or light regex) gives immediate UX feedback. Backend validation is mandatory because frontend checks are trivially bypassed with curl or DevTools. Use the same pattern on both sides, and add MX record lookup on the backend for higher-confidence validation on important forms.

What valid email addresses does the common regex reject?

The pattern /^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$/ rejects plus-addressing ([email protected], used by ~40% of Gmail users for filtering), long TLDs over 4 characters, and valid special characters like ! and % in local parts. The \w shorthand misses + and - which are explicitly permitted in RFC 5321.

Is there a ReDoS risk with email validation regex?

Yes — many popular email regex patterns have catastrophic backtracking with nested quantifiers like ([a-zA-Z0-9]+\.?)+. Test any email regex with 'a'.repeat(50) + '!' before deploying. If it takes more than a millisecond, it's vulnerable. The validator.js npm package (4.8M+ weekly downloads) ships audited ReDoS-safe email validation.

How does the HTML input type="email" validate email addresses?

Per the WHATWG HTML Living Standard, type="email" uses a specific algorithm — not a full regex — that requires one @ symbol, a non-empty local part, and a non-empty domain. Notably, it does NOT require a TLD, so "user@localhost" passes. The spec deliberately diverges from RFC 5322 to be practical for web forms. Add a pattern attribute if you need TLD enforcement.

Test and Debug Your Regex Patterns

Use BytePane's Regex Cheat Sheet to quickly reference metacharacters and quantifiers while building your validation patterns. For broader input validation beyond email — URLs, IPs, phone numbers — see the Regex Validation Patterns guide.

Open Regex Cheat Sheet

Related Articles