Text Diff Tool: Compare Two Texts & Highlight Differences
52 years ago, in 1974, the first diff utility shipped as part of Unix 5th Edition — written by Douglas McIlroy and James Hunt at Bell Labs. The core problem it solved has not changed: given two versions of a text, what is the minimum set of changes that transforms one into the other? Modern text diff tools (GitHub code review, VS Code's diff editor, Gerrit, GitLab) all trace their lineage to the same mathematical problem, with progressively better algorithms and UX layered on top.
Key Takeaways
- ▸The Myers O(ND) algorithm (1986) is used by GNU diff and Git — it finds the minimum edit script in O(ND) time where N = total lines and D = differences
- ▸Patience diff (Git's default since v2.12) produces more human-readable output by anchoring on unique lines before recursing
- ▸Hunt-McIlroy (the original 1974 algorithm) was a 5,000× improvement over naive dynamic programming for 10,000-line files
- ▸Unified diff format was created by Wayne Davison in August 1990 and standardized in GNU diff 1.15 (January 1991)
- ▸For JSON/YAML comparison, normalize formatting first — raw text diff on pretty-printed JSON produces noise from indentation changes
The Mathematics Behind Text Comparison
Every text diff tool reduces to one problem: finding the Longest Common Subsequence (LCS) of two texts. The LCS is the longest sequence of lines (or characters) that appear in the same order in both texts, though not necessarily contiguously. Lines in the LCS are "unchanged"; everything else is either an insertion or a deletion.
The naive dynamic programming approach to LCS has O(N²) time and space complexity — comparing two 10,000-line files requires 100 million operations and 100 million cells of memory. For the Unix systems of the early 1970s, this was impractical.
// What "edit distance" means:
// Transform "ABCABBA" → "CBABAC"
//
// One possible edit script (not minimum):
// Delete A, Delete B, keep C, keep B, Delete A, keep B, Delete B, Add A, keep C
//
// Minimum edit script (LCS = "CABA" or "CBAB"):
// - 4 deletions + 2 insertions = 6 operations
//
// A diff tool finds this minimum edit script and presents it as:
// Lines prefixed with - (deleted from original)
// Lines prefixed with + (inserted into new)
// Lines with no prefix (unchanged, in LCS)A Timeline of Diff Algorithms
1974: Hunt-McIlroy — The Original
James Hunt and Douglas McIlroy's 1974 algorithm (documented in their 1976 paper "An Algorithm for Differential File Comparison" from Dartmouth) introduced the concept of "k-candidates" — matching pairs that could extend the LCS. By focusing only on positions where the two files share common elements, it avoided the full N×M comparison matrix.
The practical impact was dramatic. Per their paper, comparing two 10,000-line files requires approximately 20,000 operations with Hunt-McIlroy versus 100 million with classic dynamic programming — a 5,000× improvement. This made diff practical on the PDP-11 hardware of the era.
1986: Myers Algorithm — The Current Standard
Eugene Myers published "An O(ND) Difference Algorithm and Its Variations" in 1986, which became the foundation for GNU diff, Git, and virtually every modern diff tool. The algorithm's key insight is to explore the edit graph diagonally rather than by rows.
Myers runs in O(ND) time where N is the total number of lines in both files and D is the number of differences (edit distance). When files are similar (D is small), this is extremely efficient — O(N) in the best case of identical files. The space complexity is O(N + D²), which Myers later improved to O(N) with the linear space refinement.
// Myers algorithm conceptually:
// Represents the problem as a graph where:
// - Horizontal move = delete a line from file A
// - Vertical move = insert a line from file B
// - Diagonal move (free) = lines match, no edit needed
//
// Find the shortest path from (0,0) to (len_A, len_B)
// = the minimum edit script
//
// Example: diff "cat" against "cut"
// File A: ["c", "a", "t"]
// File B: ["c", "u", "t"]
//
// Optimal path: keep "c" (diagonal), delete "a", insert "u", keep "t"
// Edit script: -a +u (2 operations, minimum possible)
//
// Output:
// c
// - a
// + u
// t2005: Patience Diff — Readable over Optimal
Patience diff was invented by Bram Cohen (creator of BitTorrent) and became Git's preferred algorithm for human-readable output. It uses a fundamentally different strategy: instead of finding the globally shortest edit script, it first finds lines that appear exactly once in both files (unique lines) and uses those as anchors.
The algorithm: (1) find all lines unique to both files, (2) find their longest common subsequence, (3) recursively diff the sections between these matched unique lines. This tends to align logically significant code boundaries rather than accidentally matching braces or blank lines.
| Aspect | Myers | Patience | Histogram |
|---|---|---|---|
| Goal | Minimum edit distance | Human-readable output | Stable, fewer false matches |
| Strategy | Shortest edit script (graph traversal) | Unique-line anchoring | Patience + line frequency weighting |
| Time complexity | O(ND) | O(N log N) for unique-line phase | O(N log N) |
| Git usage | Available (--diff-algorithm=myers) | Available (--diff-algorithm=patience) | Git default since v2.12 |
| Best for | Machine-processing patches | Code review with moved blocks | General purpose code review |
| Used by | GNU diff, many libraries | Mercurial, some Git configs | Git (default), GitHub, GitLab |
# Switch Git's diff algorithm:
git config --global diff.algorithm histogram # most readable (default since 2.12)
git config --global diff.algorithm patience # good for refactored code
git config --global diff.algorithm myers # minimum diff size
# Per-command override:
git diff --diff-algorithm=patience HEAD~1 HEAD
# Word-level diff (highlight changes within a line):
git diff --word-diff HEAD~1 HEAD
# Character-level diff (most granular):
git diff --word-diff=color --word-diff-regex=. HEAD~1 HEADReading Unified Diff Format
Unified diff format was created by Wayne Davison in August 1990 and added to GNU diff by Richard Stallman one month later, shipping in GNU diff 1.15 in January 1991. It has been the standard patch format for 35 years — every code review tool, patch submission system, and Git output uses it.
--- a/src/auth/middleware.ts ← Original file path
+++ b/src/auth/middleware.ts ← New file path
@@ -12,9 +12,11 @@ ← Hunk header (see below)
import { Request, Response } from 'express';
import jwt from 'jsonwebtoken';
-export function authMiddleware(req: Request, res: Response, next: any) {
+export function authMiddleware( ← Lines starting with + were added
+ req: AuthenticatedRequest, ← (new version)
+ res: Response,
+ next: NextFunction
+) {
const authHeader = req.headers.authorization;
- if (!authHeader) {
+ if (!authHeader?.startsWith('Bearer ')) {
return res.status(401).json({ error: 'Missing token' });
}
}
// Reading the hunk header: @@ -12,9 +12,11 @@
// -12,9 → in original file: starts at line 12, spans 9 lines
// +12,11 → in new file: starts at line 12, spans 11 lines
// (difference: +2 lines net — we added more than we removed)
//
// Line prefixes:
// (space) — unchanged context line (shown by default: 3 lines each side)
// - — removed from original
// + — added in new version
// \ No newline at end of file — missing final newline (affects patches)Implementing Text Diff in JavaScript
For browser-based tools or Node.js utilities, diff (the npm package by kpdecker, 2.9 million weekly downloads per npm registry, consistent since 2014) implements the Myers algorithm in pure JavaScript with multiple output modes.
// npm install diff
import * as Diff from 'diff';
const oldText = `function greet(name) {
console.log('Hello, ' + name);
return name;
}`;
const newText = `function greet(name: string): string {
const greeting = `Hello, ${name}!`;
console.log(greeting);
return greeting;
}`;
// Line-by-line diff:
const lineDiff = Diff.diffLines(oldText, newText);
lineDiff.forEach(part => {
if (part.added) console.log('[+]', part.value);
else if (part.removed) console.log('[-]', part.value);
else console.log('[ ]', part.value);
});
// Word-level diff (highlights changed words within lines):
const wordDiff = Diff.diffWords(oldText, newText);
// Character-level diff (most granular):
const charDiff = Diff.diffChars(oldText, newText);
// Generate a unified diff patch string:
const patch = Diff.createTwoFilesPatch(
'original.ts',
'modified.ts',
oldText,
newText,
'Original version',
'Modified version',
{ context: 3 } // lines of context around each hunk
);
console.log(patch);
// --- original.ts
// +++ modified.ts
// @@ -1,4 +1,5 @@
// -function greet(name) {
// +function greet(name: string): string {
// ...
// Apply a patch:
const patched = Diff.applyPatch(oldText, patch);Building a Visual Diff Component (React)
// Lightweight visual diff renderer without external UI deps
import * as Diff from 'diff';
interface DiffViewProps {
oldText: string;
newText: string;
context?: number;
}
export function InlineDiffView({ oldText, newText }: DiffViewProps) {
const parts = Diff.diffLines(oldText, newText, { newlineIsToken: false });
return (
<pre className="font-mono text-sm leading-5 overflow-x-auto">
{parts.map((part, i) => {
if (part.added) {
return (
<span key={i} className="bg-green-900/40 text-green-300 block">
{part.value.split('
').filter(Boolean).map((line, j) => (
<span key={j} className="block">{'+ '}{line}</span>
))}
</span>
);
}
if (part.removed) {
return (
<span key={i} className="bg-red-900/40 text-red-300 block">
{part.value.split('
').filter(Boolean).map((line, j) => (
<span key={j} className="block">{'- '}{line}</span>
))}
</span>
);
}
return (
<span key={i} className="text-gray-400 block">
{part.value.split('
').filter(Boolean).map((line, j) => (
<span key={j} className="block">{' '}{line}</span>
))}
</span>
);
})}
</pre>
);
}Text Diff Tools: A Practical Comparison
The right diff tool depends entirely on your context. Here is an honest comparison across the options developers actually use.
| Tool | Best For | Algorithm | Limitation |
|---|---|---|---|
| BytePane Text Diff | Quick online text comparison, no install | Myers (diff.js) | Browser-only, no large file support |
| VS Code built-in diff | Side-by-side code review, local files | Myers | Requires VS Code, not shareable |
| git diff | Code changes in version control | Histogram (default), patience, Myers | Terminal only; requires Git |
| GitHub / GitLab PR diff | Team code review | Histogram | Requires remote push; not for arbitrary text |
| Meld | 3-way merge, directory comparison | Myers | Desktop app, Linux-first |
| diffoscope | Binary reproducibility, package diff | Myers + format-specific parsers | Heavy dependency, CI/security use only |
For most development tasks, git diff is the right tool — it is always available, understands binary files (by refusing to diff them), and integrates with the version history you already have. Online text diff tools fill the gap for non-versioned comparisons: pasting two config files, comparing API responses, reviewing a colleague's text snippet, or any scenario where you have raw text without a Git repository.
Practical Use Cases with Real Examples
Comparing API Responses Before and After a Deploy
# Capture API response before the change
curl -s https://api.example.com/v1/products/42 | jq . > before.json
# Deploy the change, then capture again
curl -s https://api.example.com/v1/products/42 | jq . > after.json
# Diff (jq normalizes formatting and key order):
diff before.json after.json
# Or: use git diff for color output without a repo
git diff --no-index before.json after.json
# Expected output — only changed fields:
# @@ -8,7 +8,7 @@
# "price": 29.99,
# - "stock": 142,
# + "stock": 141,
# "available": trueConfig File Audit (nginx, Kubernetes, Terraform)
# Compare production vs staging nginx config:
diff /etc/nginx/nginx.conf.prod /etc/nginx/nginx.conf.staging
# Compare Kubernetes manifests (ignore comments):
diff <(grep -v '^#' k8s-prod.yaml) <(grep -v '^#' k8s-staging.yaml)
# Terraform plan equivalent — diff state files:
terraform show -json terraform.tfstate | jq . > before.tf.json
# ... apply change ...
terraform show -json terraform.tfstate | jq . > after.tf.json
diff before.tf.json after.tf.jsonValidating Environment Variable Drift
# Sort both .env files before diffing (key order may differ):
diff <(sort .env.example) <(sort .env.production.keys_only)
# Output:
# < DATABASE_POOL_SIZE=10 ← in .env.example, missing from production
# > NEW_RELIC_LICENSE_KEY=... ← in production, not documented in example
# This catches:
# 1. Keys in .env.example that production hasn't set (missing env vars)
# 2. Keys in production that aren't documented (undocumented vars)For structured data comparisons like JSON, you can use our JSON Formatter to normalize both JSON payloads first (consistent key ordering, indentation), then paste them into the diff tool. This eliminates false differences from formatting changes and focuses the diff on actual data changes.
Performance: When Diff Gets Slow
Myers O(ND) is fast when files are similar (small D). The worst case is O(N²) when files are completely different — no common lines. In practice, this manifests when diffing generated files (minified JavaScript, binary-encoded data, build artifacts) that look entirely changed.
# Symptoms of slow diff:
# - git diff hangs on a JavaScript bundle change
# - diff reports "Binary files differ" incorrectly for text files
# Solutions:
# 1. Tell git to use patience for specific file types (.gitattributes):
*.min.js diff=nodiff
*.bundle.js diff=nodiff
# 2. Increase diff timeout for large files:
git config diff.renameLimit 999999
# 3. Use --stat instead of full diff for quick summary:
git diff --stat HEAD~1 HEAD
# 4. For very large text files (logs, generated files):
# Split into manageable chunks before diffing
split -l 1000 large.log chunk_
diff chunk_aa.orig chunk_aa.newThe Python difflib library (stdlib, no install required) includes a SequenceMatcher with an autojunk heuristic that skips lines appearing in more than 1% of both files. For log files and generated content, this dramatically reduces diff time at the cost of potentially missing changes in repeated content.
Frequently Asked Questions
What algorithm does the Unix diff command use?
GNU diff uses the Myers O(ND) algorithm by default, published by Eugene Myers in 1986. It finds the shortest edit script in O(ND) time where N = total lines and D = differences. For large files with few differences, Myers is extremely efficient. Git uses histogram diff (a variant of patience) as its default for more human-readable output.
What is unified diff format and how do I read it?
Lines starting with - were removed; lines starting with + were added; space-prefixed lines are unchanged context. The @@ -3,7 +3,6 @@ header means: in the original, this hunk starts at line 3 spanning 7 lines; in the new file, line 3 spanning 6 lines. Unified format was created by Wayne Davison in August 1990, shipped in GNU diff 1.15 in January 1991.
What is the difference between Myers diff and patience diff?
Myers finds the mathematically shortest edit script — minimum total lines changed. Patience diff first matches unique lines appearing exactly once in both files as anchors, then recursively diffs sections between matches. Patience produces more human-readable output — it avoids matching generic braces or blank lines and aligns logically related code sections. Git defaults to histogram diff (patience + frequency weighting).
How do I compare two JSON files and see differences?
Normalize both files first with jq . to ensure consistent key ordering and indentation, then run diff or git diff --no-index. Raw text diff on JSON is brittle — reformatting produces hundreds of false differences. For deep structural comparison ignoring ordering entirely, use a dedicated JSON diff library (json-diff in Node.js, deepdiff in Python) rather than text comparison.
What does "context lines" mean in diff output?
Context lines are unchanged lines shown around each changed region. GNU diff defaults to 3 context lines (diff -u). More context (diff -U 10) helps you locate changes but produces larger output. Zero context (diff -U 0) shows only changed lines — useful for machine-readable patches but hard to read manually. GitHub and GitLab show 3 context lines by default.
Can I use a text diff tool to compare API responses?
Yes. Capture two API responses with curl -s url | jq . > before.json, make changes, capture again, then diff. Pipe through jq . first to normalize JSON formatting. This catches unexpected field additions, type changes, and value regressions that integration tests miss if they only assert on a subset of fields.
What is the difference between character-level and line-level diff?
Line-level diff (standard) treats each line as atomic — any change marks the whole line changed. Character-level diff highlights specific characters or words that changed within a line. Git provides word diff with git diff --word-diff. Character-level is more informative for small changes in long lines but noisier for structural changes like indentation shifts.
Compare Two Texts Instantly
Paste any two texts into BytePane's diff tool for instant side-by-side comparison with additions and deletions highlighted. No installation, no account — works entirely in your browser.
Open Text Diff ToolRelated Articles
Git Cheat Sheet
git diff, git log -p, git show — all the diff-related Git commands with practical examples.
JSON Formatting Guide
Normalize JSON before diffing — how to pretty-print and sort keys for clean comparison.
Git Rebase vs Merge
How Git applies patches (which are diffs) during rebase versus merge operations.
Linux Command Cheat Sheet
diff, patch, comm, and other text comparison utilities in the Linux toolbox.