BytePane

MD5 Hash Generator: Create & Verify MD5 Checksums

Security14 min read

The Myth: “MD5 Is Dead, Never Use It”

MD5 has been “cryptographically broken” since 2004 — that's twenty-two years of the security community repeating the same warning. And the warning is entirely correct for passwords, certificates, and digital signatures. But the message has been garbled in transmission: many developers now believe MD5 is worthless for any use case, which is wrong.

MD5 remains appropriate for non-adversarial integrity checks: verifying that a file downloaded correctly, generating deterministic cache keys, HTTP ETags, content-addressable storage, partitioning data in distributed systems. In these contexts, collision resistance is irrelevant — no attacker is crafting inputs to produce matching hashes. What matters is detecting accidental corruption, and MD5 does that perfectly well.

The security issue is specific: MD5 should not be trusted when an adversary might control the input. This article explains exactly what's broken, what isn't, and gives you production code for generating and verifying MD5 hashes across languages. To generate an MD5 hash immediately, try BytePane's hash generator tool — supports MD5, SHA-1, SHA-256, and SHA-512.

Key Takeaways

  • MD5 produces a 128-bit (32 hex character) hash. Designed by Ronald Rivest (MIT) in 1991, standardized as RFC 1321.
  • Broken for security: collision attacks are practical in seconds. NIST deprecated MD5 for digital signatures in 2008. Never use for passwords, certificates, or signatures.
  • Still valid: file download integrity checks, HTTP ETag generation, cache keys, content deduplication — any use where no adversary controls the input.
  • For passwords: use bcrypt, scrypt, or Argon2id. Never any raw hash algorithm — even SHA-256 is dangerously fast for brute-force.
  • Per npm download stats (April 2026), the md5 package still gets ~2.5M weekly downloads — most for legitimate non-security uses.

What MD5 Is: The Algorithm

MD5 (Message Digest Algorithm 5) is a cryptographic hash function designed by Ronald Rivest at MIT and published as RFC 1321 in 1992. It takes arbitrary-length input and produces a fixed 128-bit (16-byte) digest, typically represented as 32 lowercase hexadecimal characters.

The algorithm works in four stages: padding the input to a multiple of 512 bits, processing each 512-bit block through four rounds of bitwise operations using constants derived from the sine function, and producing a 128-bit state from four 32-bit registers (A, B, C, D) initialized to specific constants. The core is a Merkle–Damgård construction — the same structural approach used by SHA-1 and SHA-2.

Properties of MD5

PropertyMD5SHA-256SHA-3-256
Output size128 bits (32 hex chars)256 bits (64 hex chars)256 bits (64 hex chars)
Collision resistance❌ Broken (2004)✅ No known attacks✅ No known attacks
Preimage resistance✅ Intact (practically)✅ Intact✅ Intact
Password storage❌ Never (use bcrypt)❌ Too fast (use bcrypt)❌ Too fast (use bcrypt)
File checksums✅ Appropriate✅ Appropriate✅ Appropriate
Digital signatures❌ Deprecated (NIST 2008)✅ Safe✅ Safe
HTTP ETags✅ Standard use✅ Standard useRare (longer string)
Throughput (software)~600–800 MB/s~300–400 MB/s~200–300 MB/s

Throughput figures are approximate software benchmarks from OpenSSL 3.x speed tests on modern x86-64 hardware. MD5 is genuinely faster because it operates on 32-bit words with fewer rounds, which is why it is still used for checksums where security is not a concern but speed is.

The Collision Attack: What Exactly Is Broken

A hash collision means two different inputs produce the same hash output. For a 128-bit hash, the birthday bound gives a theoretical probability — you'd expect to find a collision after about 2⁶⁴ random trials. That's roughly 18 quintillion operations — impractical to brute-force.

In 2004, Xiaoyun Wang and Hongbo Yu (Shandong University) published a cryptanalysis paper demonstrating that MD5's internal structure allows differential collisions to be found far faster than the birthday bound. Their attack found collisions in roughly 2³⁹ operations — computable in an hour on commodity hardware at the time, and in seconds today using tools like HashClash.

The practical consequences were severe:

  • 2008: Researchers created a rogue Certificate Authority certificate by exploiting MD5 collisions in SSL certificate signing — proving they could impersonate any HTTPS site.
  • 2012: The Flame malware (attributed to nation-state actors) used an MD5 collision to fake a Microsoft Windows Update digital signature, allowing it to spread as an apparently legitimate update.

Following these incidents, NIST formally deprecated MD5 for digital signatures in Special Publication 800-131A (2011). The CA/Browser Forum prohibited MD5 in SSL certificates. Git transitioned from SHA-1 to SHA-256 for commit hashes (though SHA-1 also has known collision attacks — the SHAttered attack in 2017).

What “Broken” Means Practically

# Two different files with the same MD5 — a real collision example
# Generated using the HashClash chosen-prefix collision tool
# (these are actual collision blocks, not made up)

File 1:
d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89
55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b

File 2:
d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89
55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbdf280373c5b

# Both produce MD5: 79054025255fb1a26e4bc422aef54eb4

# But SHA-256 differs completely:
# File 1: a87ff679a2f3e71d9181a67b7542122c...
# File 2: e4da3b7fbbce2345d7772b0674a318d5...

Where MD5 Is Still Appropriate in 2026

Collision resistance matters only when an adversary can craft inputs. In these categories, that threat model does not apply:

File Download Integrity Verification

When you download a package from a trusted server and verify the MD5 matches the published checksum, the goal is detecting accidental corruption in transit — bit flips, truncated downloads, CDN caching bugs. No adversary is crafting a corrupted file designed to match the checksum. MD5 is completely sufficient here, and it is faster than SHA-256. Many Linux distributions and open-source projects still publish MD5 checksums alongside SHA-256 for legacy tool compatibility.

# Linux (md5sum is part of GNU coreutils)
md5sum ubuntu-22.04-server.iso
# 84aea60b36e2f80b2c9e02ff7a7a1e5f  ubuntu-22.04-server.iso

# Verify against published checksum
echo "84aea60b36e2f80b2c9e02ff7a7a1e5f  ubuntu-22.04-server.iso" | md5sum -c
# ubuntu-22.04-server.iso: OK

# macOS
md5 ubuntu-22.04-server.iso

# Windows PowerShell
Get-FileHash ubuntu-22.04-server.iso -Algorithm MD5

HTTP ETags and Cache Keys

ETags identify a specific version of a resource for HTTP caching. RFC 7232 defines ETags as opaque strings — any format works. Using an MD5 of the file content is a common implementation because it's deterministic (same content = same ETag across server restarts), compact (32 chars), and fast to compute. Apache's default ETag uses inode+size+mtime; Nginx uses size+mtime; many application servers use content hashes.

// Express.js: MD5-based ETag generation
import { createHash } from 'crypto'
import { readFileSync } from 'fs'

function generateETag(filePath: string): string {
  const content = readFileSync(filePath)
  return '"' + createHash('md5').update(content).digest('hex') + '"'
}

// Middleware that handles conditional GET
app.get('/assets/:file', (req, res) => {
  const filePath = `./static/${req.params.file}`
  const etag = generateETag(filePath)

  if (req.headers['if-none-match'] === etag) {
    return res.status(304).send()  // Not Modified
  }

  res.setHeader('ETag', etag)
  res.sendFile(filePath)
})

Content Deduplication and Sharding

Many blob storage systems use content hashes for deduplication. Git uses SHA-1 (transitioning to SHA-256) for this purpose. Cassandra uses MD5 to partition data evenly across nodes. S3's ETag for single-part uploads is the MD5 of the object content. In all these cases, the property needed is determinism and avalanche effect (small input changes → completely different hash), not collision resistance. MD5 has both.

// Deterministic sharding with MD5
// Maps a user_id to one of N shards, consistently
function getShardIndex(userId: string, shardCount: number): number {
  const hash = createHash('md5').update(userId).digest('hex')
  // Take the first 8 hex chars as a 32-bit int
  const n = parseInt(hash.slice(0, 8), 16)
  return n % shardCount
}

getShardIndex('user_12345', 16)   // always same shard for same userId
getShardIndex('user_12346', 16)   // likely different shard

Generating MD5 Hashes: Code in Every Language

JavaScript / Node.js

Node.js's built-in crypto module has native MD5 support. The browser Web Crypto API does not support MD5 (it was intentionally omitted for security reasons), so for browser-side MD5 you need a library.

// Node.js — built-in crypto (no dependencies)
import { createHash } from 'crypto'

// String → MD5
function md5(input: string): string {
  return createHash('md5').update(input, 'utf8').digest('hex')
}

md5('hello')         // "5d41402abc4b2a76b9719d911017c592"
md5('Hello')         // "8b1a9953c4611296a827abf8c47804d7"  ← different!
md5('')              // "d41d8cd98f00b204e9800998ecf8427e"  ← empty string hash

// File → MD5 (streaming, handles large files efficiently)
import { createReadStream } from 'fs'

async function md5File(filePath: string): Promise<string> {
  return new Promise((resolve, reject) => {
    const hash = createHash('md5')
    const stream = createReadStream(filePath)
    stream.on('data', chunk => hash.update(chunk))
    stream.on('end', () => resolve(hash.digest('hex')))
    stream.on('error', reject)
  })
}

await md5File('./package.json')  // hex hash of the file

// Buffer → MD5 (binary data)
const buf = Buffer.from([0x00, 0xFF, 0xDE, 0xAD])
createHash('md5').update(buf).digest('hex')

// Browser — using the 'md5' npm package (pure JS implementation)
// npm install md5
import md5 from 'md5'
md5('hello')  // "5d41402abc4b2a76b9719d911017c592"

// Browser — using SubtleCrypto (SHA-256 only — no MD5)
// const hash = await crypto.subtle.digest('MD5', data)  ← NOT supported

Python

Python's hashlib is the standard library for all hash algorithms. Since Python 3.9, it includes a usedforsecurity parameter — pass False when using MD5 for non-security purposes to silence FIPS compliance warnings.

import hashlib

# String → MD5
def md5_string(s: str) -> str:
    return hashlib.md5(s.encode('utf-8'), usedforsecurity=False).hexdigest()

md5_string('hello')  # '5d41402abc4b2a76b9719d911017c592'

# File → MD5 (streaming for large files)
def md5_file(path: str) -> str:
    h = hashlib.md5(usedforsecurity=False)
    with open(path, 'rb') as f:
        for chunk in iter(lambda: f.read(65536), b''):
            h.update(chunk)
    return h.hexdigest()

md5_file('/path/to/ubuntu.iso')  # file hash

# Bytes → MD5
hashlib.md5(b'ÿÞ­', usedforsecurity=False).hexdigest()

# Compare two files for equality (without reading both into memory)
def files_identical(path1: str, path2: str) -> bool:
    return md5_file(path1) == md5_file(path2)

Go

import (
    "crypto/md5"
    "encoding/hex"
    "fmt"
    "io"
    "os"
)

// String → MD5
func MD5String(s string) string {
    h := md5.Sum([]byte(s))
    return hex.EncodeToString(h[:])
}

fmt.Println(MD5String("hello"))
// 5d41402abc4b2a76b9719d911017c592

// File → MD5 (streaming)
func MD5File(path string) (string, error) {
    f, err := os.Open(path)
    if err != nil { return "", err }
    defer f.Close()

    h := md5.New()
    if _, err := io.Copy(h, f); err != nil {
        return "", err
    }
    return hex.EncodeToString(h.Sum(nil)), nil
}

hash, err := MD5File("./go.sum")
// checksum of file

Bash (Linux / macOS)

# String → MD5
echo -n "hello" | md5sum
# 5d41402abc4b2a76b9719d911017c592  -

# File → MD5
md5sum /path/to/file.tar.gz
# 84aea60b36e2f80b2c9e02ff7a7a1e5f  file.tar.gz

# macOS (md5 instead of md5sum, different output format)
echo -n "hello" | md5
# 5d41402abc4b2a76b9719d911017c592

md5 file.tar.gz
# MD5 (file.tar.gz) = 84aea60b36e2f80b2c9e02ff7a7a1e5f

# Verify a checksums file
md5sum -c checksums.md5

# Generate checksums for multiple files
md5sum *.iso > checksums.md5

Passwords: Why No Hash Is Appropriate Without a Password-Hashing Function

Storing passwords as MD5 hashes is catastrophic — but the reason is not just the collision vulnerability. It's that speed itself is the problem. MD5 can compute over 100 billion hashes per second on modern GPU hardware (per published Hashcat benchmarks). That makes brute-force and dictionary attacks trivially fast.

SHA-256 is faster than MD5 on modern hardware with AES-NI — and equally unsuitable for passwords. Any fast hash algorithm is wrong for passwords. The right tools are purpose-built slow algorithms:

AlgorithmRecommended?Memory-Hard?Notes
Argon2id✅ Best choiceYesPHC winner (2015). Recommended by NIST SP 800-63B (2020 update). Configurable memory + time + parallelism.
bcrypt✅ GoodNoIndustry standard since 1999. Max 72-byte input. 10+ rounds for web apps. Supported everywhere.
scrypt✅ GoodYesMemory-hard. Use when you want stronger GPU resistance than bcrypt.
PBKDF2-SHA256⚠️ AcceptableNoFIPS-compliant. Use 600,000+ iterations (NIST SP 800-132 2023 recommendation). Not memory-hard.
SHA-256 (raw)❌ NeverNoToo fast. Billions of attempts/second on GPU.
MD5 (raw)❌ NeverNoToo fast AND collisions. LinkedIn breach (2012) used unsalted SHA-1 — 6.5M passwords cracked in hours.
// Node.js — Argon2id (argon2 package)
import * as argon2 from 'argon2'

// Hash a password
const hash = await argon2.hash('user_password', {
  type: argon2.argon2id,
  memoryCost: 65536,   // 64 MB
  timeCost: 3,         // 3 iterations
  parallelism: 4,
})
// "$argon2id$v=19$m=65536,t=3,p=4$..."  ← includes salt

// Verify
const isValid = await argon2.verify(hash, 'user_password')  // true

// Node.js — bcrypt (bcryptjs, pure JS)
import bcrypt from 'bcryptjs'

const hashed = await bcrypt.hash('user_password', 12)  // cost factor 12
const matches = await bcrypt.compare('user_password', hashed)  // true

MD5 vs. SHA Family: When to Upgrade

If you're using MD5 for checksums and want to upgrade, the cost is minimal: SHA-256 is roughly 1.5–2× slower on software, and produces a 64-character hex string instead of 32. The code change is a one-line substitution in most languages. Whether the upgrade is worth it depends on your threat model:

  • File download verification: MD5 is fine unless an attacker could compromise the download server (in which case the checksum itself is compromised regardless). SHA-256 costs you 64-char strings and a tiny performance hit.
  • ETags: MD5 is fine. RFC 7232 says ETags are opaque. SHA-256 would make ETags longer — a minor HTTP overhead.
  • Content deduplication in internal systems: MD5 is fine unless attackers can craft inputs (unlikely in most internal contexts). If in doubt, use SHA-256.
  • Any public-facing or security-sensitive context: Always SHA-256 or better.

For a broader look at hash functions and their security properties, see BytePane's hash functions explained article — covers MD5, SHA-1, SHA-2, SHA-3, and BLAKE3.

Frequently Asked Questions

What is an MD5 hash?

MD5 (Message Digest Algorithm 5) produces a 128-bit hash from any input, represented as 32 hex characters. For example, MD5 of "hello" is 5d41402abc4b2a76b9719d911017c592. Designed by Ronald Rivest (MIT) in 1991, standardized in RFC 1321. It is cryptographically broken for security use but still used for checksums.

Is MD5 safe to use in 2026?

MD5 is NOT safe for passwords, digital signatures, or certificates — cryptographically broken since 2004, and deprecated by NIST in 2008. But it is still appropriate for non-adversarial checksums: file download verification, ETags, cache keys, and data deduplication where no attacker controls the input.

What should I use instead of MD5 for passwords?

Never use any hash algorithm directly for passwords — even SHA-256 is too fast (billions of attempts/second on GPUs). Use Argon2id (NIST SP 800-63B recommended), bcrypt (cost factor 12+), or scrypt. These are intentionally slow and salted, preventing rainbow table and brute-force attacks.

How do I verify an MD5 checksum on Linux/macOS?

Linux: md5sum file.tar.gz — compares against published hash. Verify a checksums file: md5sum -c checksums.md5. macOS: md5 file.tar.gz (different output format). Python: hashlib.md5(open("file","rb").read(), usedforsecurity=False).hexdigest().

What is an MD5 collision attack?

A collision finds two different inputs with the same hash. In 2004, Xiaoyun Wang and Hongbo Yu demonstrated practical MD5 collisions. By 2008, researchers forged SSL certificates using MD5 collisions. In 2012, Flame malware faked a Microsoft digital signature via MD5 collision. Collisions take seconds on consumer hardware using tools like HashClash.

What is the difference between MD5 and SHA-256?

MD5 produces 128 bits (32 hex chars); SHA-256 produces 256 bits (64 hex chars). MD5 is cryptographically broken — collisions found in seconds. SHA-256 has no known practical collisions and is NIST-recommended for all new security applications. MD5 is ~2× faster in software, which makes it suitable for non-security checksums.

Related Security Articles

Generate MD5, SHA-256 & More

Compute MD5, SHA-1, SHA-256, and SHA-512 hashes instantly in your browser. Paste text or upload a file — no data leaves your machine.

Open Hash Generator →