Text Diff Tool: Compare Two Texts & Highlight Differences

Q: What algorithm does the Unix diff command use?

GNU diff uses a Myers-style O(ND) diff algorithm by default. Git documentation lists default/myers as the default algorithm, while patience and histogram are optional algorithms you can select when you want more human-readable patches for moved or repeated code blocks.

Q: What is unified diff format and how do I read it?

Unified diff format (created by Wayne Davison in 1990, adopted by GNU diff in January 1991) shows both old and new text in a single block. Lines starting with - were removed; lines starting with + were added; lines starting with space are unchanged context. The @@ -3,7 +3,6 @@ header means: in the original file, this hunk starts at line 3 and spans 7 lines; in the new file, it starts at line 3 and spans 6 lines.

Q: What is the difference between Myers diff and patience diff?

Myers diff finds a short edit script by exploring the edit graph. Patience diff first matches unique lines that appear exactly once in both files, then recursively diffs the sections between matches. Git supports patience with --diff-algorithm=patience and histogram with --diff-algorithm=histogram; both are alternatives to the default/myers mode.

Q: How do I compare two JSON files and see differences?

For structural JSON comparison (ignoring key ordering and whitespace), use a dedicated JSON diff tool rather than a plain text diff. Text diff on JSON is brittle — reformatting a file produces hundreds of spurious differences. For JavaScript/Node.js: deep-equal or JSON.stringify with sorted keys. For CLI: jq can normalize then diff. For quick visual comparison of two JSON snippets, a browser-based text diff tool with whitespace-normalization handles most cases.

Q: Can I use a text diff tool to compare API responses?

Yes, and it is a valuable debugging technique. Capture two API responses as text files (curl -s https://api.example.com/v1/data > before.json), make a change, capture again, then diff them. For structured comparison, pipe through jq . first to normalize formatting. This approach catches unexpected field additions, type changes, and value regressions that integration tests may miss if they only check a subset of fields.

Q: What is the difference between character-level and line-level diff?

Line-level diff (standard diff behavior) treats each line as atomic — a change anywhere in a line marks the whole line as changed. Character-level diff (also called word diff or inline diff) highlights the specific characters or words that changed within a line. Git provides this with git diff --word-diff. Character-level diff is more informative for reviewing small changes in long lines, but noisier for structural code changes like indentation shifts.

52 years ago, in 1974, the first diffutility shipped as part of Unix 5th Edition — written by Douglas McIlroy and James Hunt at Bell Labs. The core problem it solved has not changed: given two versions of a text, what is the minimum set of changes that transforms one into the other? Modern text diff tools (GitHub code review, VS Code's diff editor, Gerrit, GitLab) all trace their lineage to the same mathematical problem, with progressively better algorithms and UX layered on top.

Key Takeaways

▸The Myers O(ND) algorithm (1986) is the default family documented by Git and GNU diff for standard patch generation
▸Patience and histogram diff are Git options for more readable output when code blocks move or common lines repeat
▸Hunt-McIlroy (the original 1974 algorithm) was a 5,000× improvement over naive dynamic programming for 10,000-line files
▸Unified diff format was created by Wayne Davison in August 1990 and standardized in GNU diff 1.15 (January 1991)
▸For JSON/YAML comparison, normalize formatting first — raw text diff on pretty-printed JSON produces noise from indentation changes

Source Checkpoints

This guide is kept aligned with primary references instead of copied tool folklore: Git's own documentation currently lists default/myers as the default diff algorithm and exposes patience/histogram as selectable alternatives.

Git diff algorithm options — default, myers, minimal, patience, and histogram modes.
GNU Diffutils manual — unified output and context-line behavior.
Need a quick visual comparison? Open BytePane's Diff Checker.

The Mathematics Behind Text Comparison

Every text diff tool reduces to one problem: finding the Longest Common Subsequence (LCS)of two texts. The LCS is the longest sequence of lines (or characters) that appear in the same order in both texts, though not necessarily contiguously. Lines in the LCS are "unchanged"; everything else is either an insertion or a deletion.

The naive dynamic programming approach to LCS has O(N²) time and space complexity — comparing two 10,000-line files requires 100 million operations and 100 million cells of memory. For the Unix systems of the early 1970s, this was impractical.

// What "edit distance" means:
// Transform "ABCABBA" → "CBABAC"
//
// One possible edit script (not minimum):
// Delete A, Delete B, keep C, keep B, Delete A, keep B, Delete B, Add A, keep C
//
// Minimum edit script (LCS = "CABA" or "CBAB"):
// - 4 deletions + 2 insertions = 6 operations
//
// A diff tool finds this minimum edit script and presents it as:
// Lines prefixed with - (deleted from original)
// Lines prefixed with + (inserted into new)
// Lines with no prefix (unchanged, in LCS)

A Timeline of Diff Algorithms

1974: Hunt-McIlroy — The Original

James Hunt and Douglas McIlroy's 1974 algorithm (documented in their 1976 paper "An Algorithm for Differential File Comparison" from Dartmouth) introduced the concept of "k-candidates" — matching pairs that could extend the LCS. By focusing only on positions where the two files share common elements, it avoided the full N×M comparison matrix.

The practical impact was dramatic. Per their paper, comparing two 10,000-line files requires approximately 20,000 operations with Hunt-McIlroy versus 100 million with classic dynamic programming — a 5,000× improvement. This made diff practical on the PDP-11 hardware of the era.

1986: Myers Algorithm — The Current Standard

Eugene Myers published "An O(ND) Difference Algorithm and Its Variations" in 1986, which became the foundation for GNU diff, Git, and virtually every modern diff tool. The algorithm's key insight is to explore the edit graph diagonally rather than by rows.

Myers runs in O(ND) time where N is the total number of lines in both files and D is the number of differences (edit distance). When files are similar (D is small), this is extremely efficient — O(N) in the best case of identical files. The space complexity is O(N + D²), which Myers later improved to O(N) with the linear space refinement.

// Myers algorithm conceptually:
// Represents the problem as a graph where:
//   - Horizontal move = delete a line from file A
//   - Vertical move = insert a line from file B
//   - Diagonal move (free) = lines match, no edit needed
//
// Find the shortest path from (0,0) to (len_A, len_B)
// = the minimum edit script
//
// Example: diff "cat" against "cut"
// File A: ["c", "a", "t"]
// File B: ["c", "u", "t"]
//
// Optimal path: keep "c" (diagonal), delete "a", insert "u", keep "t"
// Edit script: -a +u  (2 operations, minimum possible)
//
// Output:
//   c
// - a
// + u
//   t

2005: Patience Diff — Readable over Optimal

Patience diff was invented by Bram Cohen (creator of BitTorrent) and became a popular Git option for human-readable output. It uses a fundamentally different strategy: instead of finding the globally shortest edit script, it first finds lines that appear exactly once in both files (unique lines) and uses those as anchors.

The algorithm: (1) find all lines unique to both files, (2) find their longest common subsequence, (3) recursively diff the sections between these matched unique lines. This tends to align logically significant code boundaries rather than accidentally matching braces or blank lines.

Aspect	Myers	Patience	Histogram
Goal	Minimum edit distance	Human-readable output	Stable, fewer false matches
Strategy	Shortest edit script (graph traversal)	Unique-line anchoring	Patience + line frequency weighting
Time complexity	O(ND)	O(N log N) for unique-line phase	O(N log N)
Git usage	Available (`--diff-algorithm=myers`)	Available (`--diff-algorithm=patience`)	Available (`--diff-algorithm=histogram`)
Best for	Machine-processing patches	Code review with moved blocks	General purpose code review
Used by	GNU diff, many libraries	Mercurial, some Git configs	Git optional mode, code review tools

# Switch Git's diff algorithm:
	git config --global diff.algorithm histogram    # readable for repeated or moved code
	git config --global diff.algorithm patience     # good for refactored code
	git config --global diff.algorithm myers        # Git's documented default family

# Per-command override:
git diff --diff-algorithm=patience HEAD~1 HEAD

# Word-level diff (highlight changes within a line):
git diff --word-diff HEAD~1 HEAD

# Character-level diff (most granular):
git diff --word-diff=color --word-diff-regex=. HEAD~1 HEAD

Reading Unified Diff Format

Unified diff format was created by Wayne Davison in August 1990 and added to GNU diff by Richard Stallman one month later, shipping in GNU diff 1.15 in January 1991. It has been the standard patch format for 35 years — every code review tool, patch submission system, and Git output uses it.

--- a/src/auth/middleware.ts          ← Original file path
+++ b/src/auth/middleware.ts          ← New file path
@@ -12,9 +12,11 @@                   ← Hunk header (see below)
 import { Request, Response } from 'express';
 import jwt from 'jsonwebtoken';

-export function authMiddleware(req: Request, res: Response, next: any) {
+export function authMiddleware(       ← Lines starting with + were added
+  req: AuthenticatedRequest,          ← (new version)
+  res: Response,
+  next: NextFunction
+) {
   const authHeader = req.headers.authorization;
-  if (!authHeader) {
+  if (!authHeader?.startsWith('Bearer ')) {
     return res.status(401).json({ error: 'Missing token' });
   }
 }

// Reading the hunk header:  @@ -12,9 +12,11 @@
//   -12,9   → in original file: starts at line 12, spans 9 lines
//   +12,11  → in new file: starts at line 12, spans 11 lines
//   (difference: +2 lines net — we added more than we removed)
//
// Line prefixes:
//   (space) — unchanged context line (shown by default: 3 lines each side)
//   -        — removed from original
//   +        — added in new version
//   \ No newline at end of file — missing final newline (affects patches)

Implementing Text Diff in JavaScript

For browser-based tools or Node.js utilities, diff (the npm package by kpdecker, 2.9 million weekly downloads per npm registry, consistent since 2014) implements the Myers algorithm in pure JavaScript with multiple output modes.

// npm install diff
import * as Diff from 'diff';

const oldText = `function greet(name) {
  console.log('Hello, ' + name);
  return name;
}`;

const newText = `function greet(name: string): string {
  const greeting = `Hello, ${name}!`;
  console.log(greeting);
  return greeting;
}`;

// Line-by-line diff:
const lineDiff = Diff.diffLines(oldText, newText);
lineDiff.forEach(part => {
  if (part.added) console.log('[+]', part.value);
  else if (part.removed) console.log('[-]', part.value);
  else console.log('[ ]', part.value);
});

// Word-level diff (highlights changed words within lines):
const wordDiff = Diff.diffWords(oldText, newText);

// Character-level diff (most granular):
const charDiff = Diff.diffChars(oldText, newText);

// Generate a unified diff patch string:
const patch = Diff.createTwoFilesPatch(
  'original.ts',
  'modified.ts',
  oldText,
  newText,
  'Original version',
  'Modified version',
  { context: 3 }   // lines of context around each hunk
);
console.log(patch);
// --- original.ts
// +++ modified.ts
// @@ -1,4 +1,5 @@
// -function greet(name) {
// +function greet(name: string): string {
// ...

// Apply a patch:
const patched = Diff.applyPatch(oldText, patch);

Building a Visual Diff Component (React)

// Lightweight visual diff renderer without external UI deps
import * as Diff from 'diff';

interface DiffViewProps {
  oldText: string;
  newText: string;
  context?: number;
}

export function InlineDiffView({ oldText, newText }: DiffViewProps) {
  const parts = Diff.diffLines(oldText, newText, { newlineIsToken: false });

  return (
    <pre className="font-mono text-sm leading-5 overflow-x-auto">
      {parts.map((part, i) => {
        if (part.added) {
          return (
            <span key={i} className="bg-green-900/40 text-green-300 block">
              {part.value.split('
').filter(Boolean).map((line, j) => (
                <span key={j} className="block">{'+  '}{line}</span>
              ))}
            </span>
          );
        }
        if (part.removed) {
          return (
            <span key={i} className="bg-red-900/40 text-red-300 block">
              {part.value.split('
').filter(Boolean).map((line, j) => (
                <span key={j} className="block">{'-  '}{line}</span>
              ))}
            </span>
          );
        }
        return (
          <span key={i} className="text-gray-400 block">
            {part.value.split('
').filter(Boolean).map((line, j) => (
              <span key={j} className="block">{'   '}{line}</span>
            ))}
          </span>
        );
      })}
    </pre>
  );
}

Text Diff Tools: A Practical Comparison

The right diff tool depends entirely on your context. Here is an honest comparison across the options developers actually use.

Tool	Best For	Algorithm	Limitation
BytePane Text Diff	Quick online text comparison, no install	Myers (diff.js)	Browser-only, no large file support
VS Code built-in diff	Side-by-side code review, local files	Myers	Requires VS Code, not shareable
git diff	Code changes in version control	Histogram (default), patience, Myers	Terminal only; requires Git
GitHub / GitLab PR diff	Team code review	Histogram	Requires remote push; not for arbitrary text
Meld	3-way merge, directory comparison	Myers	Desktop app, Linux-first
diffoscope	Binary reproducibility, package diff	Myers + format-specific parsers	Heavy dependency, CI/security use only

For most development tasks, git diff is the right tool— it is always available, understands binary files (by refusing to diff them), and integrates with the version history you already have. Online text diff tools fill the gap for non-versioned comparisons: pasting two config files, comparing API responses, reviewing a colleague's text snippet, or any scenario where you have raw text without a Git repository.

Practical Use Cases with Real Examples

Comparing API Responses Before and After a Deploy

# Capture API response before the change
curl -s https://api.example.com/v1/products/42 | jq . > before.json

# Deploy the change, then capture again
curl -s https://api.example.com/v1/products/42 | jq . > after.json

# Diff (jq normalizes formatting and key order):
diff before.json after.json

# Or: use git diff for color output without a repo
git diff --no-index before.json after.json

# Expected output — only changed fields:
# @@ -8,7 +8,7 @@
#    "price": 29.99,
# -  "stock": 142,
# +  "stock": 141,
#    "available": true

Config File Audit (nginx, Kubernetes, Terraform)

# Compare production vs staging nginx config:
diff /etc/nginx/nginx.conf.prod /etc/nginx/nginx.conf.staging

# Compare Kubernetes manifests (ignore comments):
diff <(grep -v '^#' k8s-prod.yaml) <(grep -v '^#' k8s-staging.yaml)

# Terraform plan equivalent — diff state files:
terraform show -json terraform.tfstate | jq . > before.tf.json
# ... apply change ...
terraform show -json terraform.tfstate | jq . > after.tf.json
diff before.tf.json after.tf.json

Validating Environment Variable Drift

# Sort both .env files before diffing (key order may differ):
diff <(sort .env.example) <(sort .env.production.keys_only)

# Output:
# < DATABASE_POOL_SIZE=10        ← in .env.example, missing from production
# > NEW_RELIC_LICENSE_KEY=...    ← in production, not documented in example

# This catches:
# 1. Keys in .env.example that production hasn't set (missing env vars)
# 2. Keys in production that aren't documented (undocumented vars)

For structured data comparisons like JSON, you can use our JSON Formatter to normalize both JSON payloads first (consistent key ordering, indentation), then paste them into the diff tool. This eliminates false differences from formatting changes and focuses the diff on actual data changes.

Performance: When Diff Gets Slow

Myers O(ND) is fast when files are similar (small D). The worst case is O(N²) when files are completely different — no common lines. In practice, this manifests when diffing generated files (minified JavaScript, binary-encoded data, build artifacts) that look entirely changed.

# Symptoms of slow diff:
# - git diff hangs on a JavaScript bundle change
# - diff reports "Binary files differ" incorrectly for text files

# Solutions:

# 1. Tell git to use patience for specific file types (.gitattributes):
*.min.js diff=nodiff
*.bundle.js diff=nodiff

# 2. Increase diff timeout for large files:
git config diff.renameLimit 999999

# 3. Use --stat instead of full diff for quick summary:
git diff --stat HEAD~1 HEAD

# 4. For very large text files (logs, generated files):
# Split into manageable chunks before diffing
split -l 1000 large.log chunk_
diff chunk_aa.orig chunk_aa.new

The Python difflib library (stdlib, no install required) includes a SequenceMatcher with an autojunk heuristic that skips lines appearing in more than 1% of both files. For log files and generated content, this dramatically reduces diff time at the cost of potentially missing changes in repeated content.

Frequently Asked Questions

What algorithm does the Unix diff command use?

GNU diff uses a Myers-style O(ND) algorithm by default. Git documentation lists default/myers as the default algorithm, and lets you opt into patience or histogram with --diff-algorithm when readability matters more than the smallest possible patch.

What is unified diff format and how do I read it?

Lines starting with - were removed; lines starting with + were added; space-prefixed lines are unchanged context. The @@ -3,7 +3,6 @@ header means: in the original, this hunk starts at line 3 spanning 7 lines; in the new file, line 3 spanning 6 lines. Unified format was created by Wayne Davison in August 1990, shipped in GNU diff 1.15 in January 1991.

What is the difference between Myers diff and patience diff?

Myers explores the edit graph to produce a compact patch. Patience diff first matches unique lines appearing exactly once in both files as anchors, then recursively diffs sections between matches. Histogram extends the idea by considering low-occurrence common elements. Git supports all three modes through --diff-algorithm.

How do I compare two JSON files and see differences?

Normalize both files first with jq . to ensure consistent key ordering and indentation, then run diff or git diff --no-index. Raw text diff on JSON is brittle — reformatting produces hundreds of false differences. For deep structural comparison ignoring ordering entirely, use a dedicated JSON diff library (json-diff in Node.js, deepdiff in Python) rather than text comparison.

What does "context lines" mean in diff output?

Context lines are unchanged lines shown around each changed region. GNU diff defaults to 3 context lines (diff -u). More context (diff -U 10) helps you locate changes but produces larger output. Zero context (diff -U 0) shows only changed lines — useful for machine-readable patches but hard to read manually. GitHub and GitLab show 3 context lines by default.

Can I use a text diff tool to compare API responses?

Yes. Capture two API responses with curl -s url | jq . > before.json, make changes, capture again, then diff. Pipe through jq . first to normalize JSON formatting. This catches unexpected field additions, type changes, and value regressions that integration tests miss if they only assert on a subset of fields.

What is the difference between character-level and line-level diff?

Line-level diff (standard) treats each line as atomic — any change marks the whole line changed. Character-level diff highlights specific characters or words that changed within a line. Git provides word diff with git diff --word-diff. Character-level is more informative for small changes in long lines but noisier for structural changes like indentation shifts.

Compare Two Texts Instantly

Paste any two texts into BytePane's diff tool for instant side-by-side comparison with additions and deletions highlighted. No installation, no account — works entirely in your browser.

Open Text Diff Tool