BytePane

Text to Binary Converter: Encode & Decode Binary Online

Encoding16 min read

From 5-Hole Paper Tape to 1.1 Million Unicode Characters: The 160-Year History of Text in Binary

In 1870, Émile Baudot developed a character encoding for telegraph machines using 5-hole paper tape. Each hole pattern represented one letter: 5 binary positions, 32 possible combinations, enough for uppercase Latin letters plus a few control codes. This was the first standardized text-to-binary encoding system.

In 1963, the American Standards Association ratified ASCII (American Standard Code for Information Interchange) — a 7-bit encoding that expanded the character set to 128 code points: uppercase and lowercase Latin letters, digits 0–9, punctuation, and 33 non-printable control characters (including carriage return, line feed, tab, and the now-infamous bell character). ASCII's design decision to use 7 bits rather than 8 was a storage optimization — magnetic tape was expensive — that left the 8th bit unused and created 60 years of encoding fragmentation as different vendors used that extra bit differently.

In 1991, the Unicode Consortium published Unicode 1.0, aiming to unify every character system in the world into a single code point space. Today, Unicode 15.1 defines 149,813 characters across 161 scripts — from ancient Cuneiform to emoji, from mathematical symbols to playing card suits.

In 1993, Ken Thompson and Rob Pike designed UTF-8 — a variable-width encoding that represents Unicode code points as 1 to 4 bytes, with the critical property of backward compatibility with ASCII. Any ASCII text file is a valid UTF-8 file. UTF-8 is now the dominant encoding on the web: per the W3Techs Web Technology Survey (April 2026), 98.3% of all websites use UTF-8 as their character encoding.

Understanding how each character you type maps to binary is not esoteric knowledge — it is the foundation of network protocols, cryptographic hashing, database storage, file formats, and every debugging session that involves inspecting raw bytes.

Key Takeaways

  • Text-to-binary conversion is a two-step process: character → code point (via Unicode) → bytes (via UTF-8) → binary representation.
  • ASCII characters (0–127) are always 1 byte in UTF-8, identical to their ASCII representation. The letter 'A' is always 01000001.
  • UTF-8 uses 1–4 bytes per character. Emoji and CJK characters require 3–4 bytes — they are not 1 byte each.
  • Per W3Techs (April 2026), 98.3% of websites use UTF-8. It is the only encoding you should use for new applications.
  • In JavaScript, TextEncoder / TextDecoder are the standard APIs. In Python, str.encode('utf-8').

The Two-Step Process: Code Points Then Bytes

Text-to-binary conversion requires understanding two distinct concepts that are often conflated: character encoding standards (which number to assign each character) and encoding schemes (how to serialize those numbers as bytes).

Step 1: Character → Code Point

Every character has a Unicode code point — a unique number in the range U+0000 to U+10FFFF:

'H'  →  U+0048  (decimal: 72)
'e'  →  U+0065  (decimal: 101)
'l'  →  U+006C  (decimal: 108)
'!'  →  U+0021  (decimal: 33)
'é'  →  U+00E9  (decimal: 233)
'中' →  U+4E2D  (decimal: 20013)
'😀' →  U+1F600 (decimal: 128512)

Step 2: Code Point → UTF-8 Bytes → Binary

UTF-8 serializes code points to 1–4 bytes, then each byte is a number 0–255 expressed in 8-bit binary:

'H'  →  code point 72  →  1 byte: [0x48]        →  01001000
'é'  →  code point 233 →  2 bytes: [0xC3, 0xA9]  →  11000011 10101001
'中' →  code point 20013 → 3 bytes: [0xE4,0xB8,0xAD] → 11100100 10111000 10101101
'😀' →  code point 128512 → 4 bytes: [0xF0,0x9F,0x98,0x80] → 11110000 10011111 10011000 10000000

The critical insight: "text to binary" is not a single operation. The binary representation of a character depends entirely on the encoding used. A naïve tool that treats each character as a single byte will silently produce wrong output for any non-ASCII character. A correct tool applies UTF-8 encoding before converting to binary.

ASCII Binary Reference: The 128 Characters You Use Every Day

ASCII's 7-bit design means printable characters occupy code points 32–126. Control characters (0–31 and 127) are non-printable. Here is the complete reference for the printable ASCII range — the characters whose binary representations you will encounter most frequently in debugging and protocol analysis:

CharDecHexBinaryCharDecHexBinary
A654101000001a976101100001
B664201000010b986201100010
C674301000011c996301100011
D684401000100d1006401100100
E694501000101e1016501100101
F704601000110f1026601100110
G714701000111g1036701100111
H724801001000h1046801101000
I734901001001i1056901101001
J744A01001010j1066A01101010
K754B01001011k1076B01101011
L764C01001100l1086C01101100
M774D01001101m1096D01101101
N784E01001110n1106E01101110
O794F01001111o1116F01101111
P805001010000p1127001110000
Q815101010001q1137101110001
R825201010010r1147201110010
S835301010011s1157301110011
T845401010100t1167401110100
U855501010101u1177501110101
V865601010110v1187601110110
W875701010111w1197701110111
X885801011000x1207801111000
Y895901011001y1217901111001
Z905A01011010z1227A01111010
04830001100005533500110101
14931001100016543600110110
25032001100107553700110111
35133001100118563800111000
45234001101009573900111001
322000100000.462E00101110
!332100100001/472F00101111
"342200100010:583A00111010
#352300100011;593B00111011
$362400100100=613D00111101
%372500100101?633F00111111
&382600100110@644001000000
'392700100111[915B01011011
(402800101000\925C01011100
)412900101001]935D01011101
*422A00101010^945E01011110
+432B00101011_955F01011111
,442C00101100`966001100000
-452D00101101{1237B01111011

The Case Bit Pattern — One of ASCII's Clever Design Decisions

Uppercase A = 01000001 (65)   Lowercase a = 01100001 (97). The only difference is bit 5 (the 32-bit position). Toggling bit 5 switches between uppercase and lowercase for all 26 letters — which is why case-insensitive comparison in ASCII contexts is just a bitwise AND with 0xDF, and case conversion is a bitwise XOR with 0x20.

UTF-8 Multi-Byte Encoding: How It Actually Works

UTF-8's variable-width design uses a clever self-synchronizing byte structure. You can identify the role of any byte by its leading bits — no context needed:

Code Point RangeBytesByte PatternData BitsExamples
U+0000 – U+007F10xxxxxxx7 bitsASCII: A, B, 0–9
U+0080 – U+07FF2110xxxxx 10xxxxxx11 bitsé, ü, ñ, £, Arabic
U+0800 – U+FFFF31110xxxx 10xxxxxx 10xxxxxx16 bitsCJK, Hebrew, Korean
U+10000 – U+10FFFF411110xxx 10xxxxxx 10xxxxxx 10xxxxxx21 bitsEmoji, historic scripts

The 10xxxxxx continuation byte pattern is why UTF-8 is self-synchronizing: if you drop into a byte stream mid-sequence, you can identify continuation bytes (starting with 10) and skip backward to the sequence start byte. This property is absent in older fixed-width encodings like UTF-16 and UTF-32, which is one of the reasons UTF-8 dominated.

Working Through 'é' (U+00E9) Step by Step

Code point: U+00E9 = 233 decimal = 0xE9

Step 1: Identify byte count.
  233 is in range U+0080–U+07FF → 2 bytes
  Pattern: 110xxxxx 10xxxxxx (11 data bits available)

Step 2: Convert 233 to binary:
  233 = 11101001 (8 bits)

Step 3: Fit bits into pattern (11 data bits, use right-to-right):
  11101001 → 00011101001 (pad to 11 bits)
  Split: 00011 | 101001
  First byte:  110 + 00011 = 11000011 = 0xC3
  Second byte: 10  + 101001 = 10101001 = 0xA9

Result: 0xC3 0xA9 → binary: 11000011 10101001

Text to Binary: "Hello, World!" Character by Character

All characters in "Hello, World!" are ASCII (U+0000–U+007F), so each is exactly 1 byte in UTF-8:

CharUnicodeDecimalHexBinary (8-bit)
HU+0048720x4801001000
eU+00651010x6501100101
lU+006C1080x6C01101100
lU+006C1080x6C01101100
oU+006F1110x6F01101111
,U+002C440x2C00101100
U+0020320x2000100000
WU+0057870x5701010111
oU+006F1110x6F01101111
rU+00721140x7201110010
lU+006C1080x6C01101100
dU+00641000x6401100100
!U+0021330x2100100001

Full binary output (space-separated by character):

01001000 01100101 01101100 01101100 01101111 00101100 00100000 01010111 01101111 01110010 01101100 01100100 00100001

Text to Binary in Code: JavaScript, Python, Go

JavaScript — Browser and Node.js

Browser-compatible using TextEncoder (WHATWG standard)
// Text to binary (UTF-8 aware):
function textToBinary(text) {
  const encoder = new TextEncoder() // Always UTF-8
  const bytes = encoder.encode(text)
  return [...bytes]
    .map(byte => byte.toString(2).padStart(8, '0'))
    .join(' ')
}

console.log(textToBinary('Hello'))
// => "01001000 01100101 01101100 01101100 01101111"

console.log(textToBinary('é'))
// => "11000011 10101001"  (2 bytes for U+00E9)

console.log(textToBinary('😀'))
// => "11110000 10011111 10011000 10000000"  (4 bytes)

// Binary back to text:
function binaryToText(binaryString) {
  const bytes = binaryString.trim().split(/s+/)
    .map(b => parseInt(b, 2))
  const decoder = new TextDecoder('utf-8')
  return decoder.decode(new Uint8Array(bytes))
}

console.log(binaryToText('01001000 01100101 01101100 01101100 01101111'))
// => "Hello"

Python

Python 3 — UTF-8 encoding built-in
def text_to_binary(text: str, encoding: str = 'utf-8') -> str:
    """Convert text to space-separated binary string."""
    return ' '.join(format(byte, '08b') for byte in text.encode(encoding))

def binary_to_text(binary_string: str, encoding: str = 'utf-8') -> str:
    """Convert space-separated binary string back to text."""
    byte_strings = binary_string.strip().split()
    byte_values = [int(b, 2) for b in byte_strings]
    return bytes(byte_values).decode(encoding)

# Examples:
print(text_to_binary('Hello'))
# 01001000 01100101 01101100 01101100 01101111

print(text_to_binary('é'))
# 11000011 10101001

print(text_to_binary('中'))
# 11100100 10111000 10101101

# Round-trip verification:
original = 'Hello, World! 🌍'
binary = text_to_binary(original)
recovered = binary_to_text(binary)
assert original == recovered, "Round-trip failed"
print(f"Round-trip verified: {len(binary.split())} bytes")

Go

Go — strings are UTF-8 natively
package main

import (
    "fmt"
    "strconv"
    "strings"
)

func textToBinary(s string) string {
    // Go strings are UTF-8 byte slices — iterate bytes directly
    parts := make([]string, 0, len(s))
    for _, b := range []byte(s) {
        parts = append(parts, fmt.Sprintf("%08b", b))
    }
    return strings.Join(parts, " ")
}

func binaryToText(binary string) (string, error) {
    parts := strings.Fields(binary)
    bytes := make([]byte, len(parts))
    for i, part := range parts {
        val, err := strconv.ParseUint(part, 2, 8)
        if err != nil {
            return "", fmt.Errorf("invalid binary at position %d: %w", i, err)
        }
        bytes[i] = byte(val)
    }
    return string(bytes), nil
}

func main() {
    fmt.Println(textToBinary("Hello"))
    // 01001000 01100101 01101100 01101100 01101111

    text, _ := binaryToText("01001000 01100101 01101100 01101100 01101111")
    fmt.Println(text) // Hello
}

Four Common Mistakes in Text-to-Binary Conversion

1. Treating Characters as Single Bytes

The most common error: using charCodeAt() in JavaScript and treating the result as a complete UTF-8 byte. 'é'.charCodeAt(0) returns 233 — the Unicode code point — not the UTF-8 encoding. For ASCII characters this coincidentally produces the right answer, so the bug hides until a non-ASCII character appears. Always use TextEncoder or equivalent.

2. Omitting the Leading Zero Pad

The space character (decimal 32) in binary is 00100000. Without the leading zeros, it becomes 100000 — only 6 bits. This makes it impossible to reconstruct the original byte stream unambiguously. Always pad to 8 bits: .padStart(8, '0') in JS, format(byte, '08b') in Python.

3. Confusing UTF-16 Surrogate Pairs

JavaScript strings are UTF-16 internally. Characters above U+FFFF (like emoji) are stored as two 16-bit "surrogate pairs." '😀'.length returns 2 in JavaScript, not 1, because the engine counts UTF-16 code units. Using TextEncoder correctly handles this — it converts the surrogate pair to the correct 4-byte UTF-8 sequence. Processing charCodeAt() on each character position produces surrogates, not valid UTF-8.

4. Incorrect Separator Assumptions During Decode

Binary strings are often space-separated (one 8-bit group per character) but sometimes presented as continuous strings with no separators. When decoding, you must know the format: 01001000... (continuous) vs 01001000 01100101... (space-separated). Continuous binary should be split into 8-character chunks: binaryStr.match(/.{1,8}/g).

Real-World Applications of Binary Text Representation

Understanding text-as-binary is not an academic exercise. These are the actual software contexts where you will encounter and need to manipulate binary text:

Network Protocol Debugging

TCP/IP packets carry bytes, not characters. Protocol analyzers like Wireshark display packet payloads in hexadecimal — reading HTTP headers, WebSocket frames, or DNS records at the byte level requires knowing which byte sequences correspond to which ASCII characters. A GET request starts with bytes 0x47 0x45 0x54 (71, 69, 84 — 'G', 'E', 'T' in ASCII).

Cryptographic Hash Verification

Cryptographic functions like SHA-256 operate on byte arrays, not strings. The hash of "Hello" and the hash of "Hello" (with a trailing newline) differ — which is why encoding matters. A security audit of hash-based verification code requires tracing exactly what bytes were fed to the hash function, which means tracing the UTF-8 encoding of the input string. See BytePane's hash functions explained guide for the full picture.

File Format Parsing

Binary file formats begin with "magic bytes" — fixed byte sequences that identify the file type. PDF files start with 25 50 44 46 hex (%PDF in ASCII). PNG files start with 89 50 4E 47 (‰PNG). Validating file uploads at the byte level rather than trusting file extensions is a security best practice — and it requires knowing the binary representation of ASCII characters.

Base64 Encoding

Base64 operates on the byte-level representation of text: it takes the UTF-8 bytes of a string and re-encodes them using only 64 ASCII-safe characters. "Hello" encodes to "SGVsbG8=" in Base64 — directly from the bytes 72, 101, 108, 108, 111. Understanding the UTF-8 byte layer clarifies why the same string in different encodings produces different Base64 output. For Base64 encoding/decoding tools and mechanics, see BytePane's Base64 encoding guide.

Embedded Systems and Hardware Registers

Microcontroller communication protocols (UART, SPI, I²C) transmit bytes. Sending a string over a serial port means writing each character's byte value to a hardware register. Embedded developers work in binary and hexadecimal constantly — knowing that ASCII 'A' is 0x41 is as fundamental as knowing that 'A' is the first letter.

Binary vs Hexadecimal vs Decimal: Three Views of the Same Byte

Binary, hexadecimal, and decimal are three notations for the same underlying byte values. Each has its use case:

NotationBaseValue of 'A'Best For
Binary201000001Bit manipulation, protocol flags, embedded systems
Hexadecimal160x41Memory dumps, network packets, color codes, byte inspection
Decimal1065Human-readable code point reference, ASCII tables
Octal80101Unix file permissions, legacy systems (now mostly hex)

The conversion between binary and hexadecimal is exact: every 4 binary bits correspond to one hex digit. 01000001 splits as 0100 (= 4 = hex 4) + 0001 (= 1 = hex 1) = 0x41. This grouping is why developers reach for hex over binary for most debugging — it is 4× more compact while preserving the exact bit pattern. If you need to convert between hexadecimal and decimal or binary programmatically, BytePane's hex to decimal converter guide covers the full conversion math.

Frequently Asked Questions

How does text to binary conversion work?

Text-to-binary conversion is a two-step process. First, each character maps to a numeric code point using Unicode — ASCII uses 7-bit code points (0–127), while Unicode defines over 1.1 million code points. Second, those code points serialize to bytes via UTF-8 encoding, where each byte is expressed as an 8-bit binary sequence. The letter 'A' → code point 65 → binary 01000001.

What is the binary for the letter 'A'?

The uppercase letter A has ASCII code point 65 (decimal). In binary, 65 = 01000001. This is also its UTF-8 representation since ASCII characters use the single-byte range of UTF-8. Lowercase 'a' is code point 97 = 01100001. The difference between uppercase and lowercase ASCII letters is always exactly bit 5 (the 32-bit position) — toggling it switches case.

What is the difference between ASCII and UTF-8 binary?

ASCII defines 128 characters (code points 0–127), each stored as a 7-bit value. UTF-8 is backward-compatible with ASCII for these 128 characters — their binary representations are identical. For code points 128 and above (accented characters, emoji, CJK), UTF-8 uses 2–4 bytes with leading continuation bits. Continuation bytes always start with the bit pattern 10xxxxxx.

How do I convert text to binary in JavaScript?

Use TextEncoder to get UTF-8 bytes, then convert each byte to binary: const encoder = new TextEncoder(); const bytes = encoder.encode('Hello'); const binary = [...bytes].map(b => b.toString(2).padStart(8, '0')).join(' '). The padStart(8,'0') is critical — it ensures each byte is represented as exactly 8 bits, including leading zeros for small values like space (00100000).

What does binary text look like for an emoji?

Emoji require 4 bytes in UTF-8. The 😀 emoji (U+1F600) encodes to bytes [0xF0, 0x9F, 0x98, 0x80]. In binary: 11110000 10011111 10011000 10000000. The first byte starts with 11110 (four 1s then a 0) indicating a 4-byte sequence. The remaining 3 bytes start with 10, marking them as UTF-8 continuation bytes.

Is binary the same as hexadecimal?

No, but they represent the same underlying byte values in different bases. Binary is base 2 (0s and 1s), hexadecimal is base 16 (0–9 and A–F). Each hex digit represents exactly 4 binary bits: hex F = binary 1111, hex 4 = 0100. An 8-bit binary byte is 2 hex digits: 01000001 = 0x41 = 65 decimal = ASCII 'A'. Hexadecimal is 4× more compact than binary.

What is binary text used for in real software?

Binary representation of text is fundamental to: network protocol debugging (inspecting raw TCP/IP packets in Wireshark), cryptography (verifying exact byte inputs to hash functions), file format parsing (identifying magic bytes at file headers), Base64 encoding (which operates on UTF-8 byte sequences), and embedded systems programming (UART/SPI communication transmits bytes, not characters).

How do I decode binary back to text?

Decoding reverses encoding: split the binary string into 8-bit groups, parse each group with parseInt(group, 2) to get byte values, then decode the bytes as UTF-8 using TextDecoder in JavaScript or bytes().decode('utf-8') in Python. For multi-byte UTF-8 sequences, TextDecoder handles grouping automatically — you must not attempt to decode each byte as a separate character.

Convert Text to Binary Instantly

Need to quickly see the binary representation of any text? BytePane's Base64 encoder shows the underlying byte encoding — or use our binary converter tool for direct text-to-binary output. Everything runs in your browser.

Open Base64 & Encoding Tool →