Text to Binary Converter: Encode & Decode Binary Online

Q: How do I convert text to binary in JavaScript?

In JavaScript, use TextEncoder to get UTF-8 bytes, then convert each byte to binary with toString(2).padStart(8, '0'): const encoder = new TextEncoder(); const bytes = encoder.encode('Hello'); const binary = [...bytes].map(b => b.toString(2).padStart(8, '0')).join(' '). This produces the standard 8-bit binary representation for each character in the string.

Q: What does binary text look like for an emoji?

Emoji require 4 bytes in UTF-8 because they occupy code points above U+FFFF. For example, 😀 (U+1F600) encodes to bytes [0xF0, 0x9F, 0x98, 0x80] in UTF-8. In binary: 11110000 10011111 10011000 10000000. Each multi-byte sequence follows the UTF-8 pattern: the first byte starts with as many 1s as there are bytes in the sequence, followed by a 0, then data bits; continuation bytes start with 10.

Q: Is binary the same as hexadecimal?

No, but they are directly related. Both represent the same underlying byte values — they differ only in their base. Binary uses base 2 (0s and 1s), hexadecimal uses base 16 (0–9 and A–F). Each hex digit represents exactly 4 binary bits (a nibble): hex F = binary 1111, hex A = 1010. An 8-bit binary byte is written as 2 hex digits: 01000001 = 0x41 = 65 decimal = ASCII 'A'.

Q: What is binary text used for in real software?

Binary representation of text is fundamental to several software domains: network protocol debugging (inspecting raw TCP/IP packets), cryptography (verifying byte-level hash inputs), steganography (embedding data in binary patterns), file format parsing (reading binary file headers), embedded systems programming (communicating with hardware registers via bit flags), and security research (analyzing binary payloads).

Q: How do I decode binary back to text?

Decoding binary to text reverses the encoding: split the binary string into 8-bit groups, convert each group from binary to decimal (parseInt('01000001', 2) = 65), then interpret the bytes as characters using TextDecoder in JavaScript or codecs.decode() in Python. For multi-byte UTF-8 sequences, you must group bytes correctly before decoding — TextDecoder handles this automatically.

From 5-Hole Paper Tape to 1.1 Million Unicode Characters: The 160-Year History of Text in Binary

In 1870, Émile Baudot developed a character encoding for telegraph machines using 5-hole paper tape. Each hole pattern represented one letter: 5 binary positions, 32 possible combinations, enough for uppercase Latin letters plus a few control codes. This was the first standardized text-to-binary encoding system.

In 1963, the American Standards Association ratified ASCII (American Standard Code for Information Interchange) — a 7-bit encoding that expanded the character set to 128 code points: uppercase and lowercase Latin letters, digits 0–9, punctuation, and 33 non-printable control characters (including carriage return, line feed, tab, and the now-infamous bell character). ASCII's design decision to use 7 bits rather than 8 was a storage optimization — magnetic tape was expensive — that left the 8th bit unused and created 60 years of encoding fragmentation as different vendors used that extra bit differently.

In 1991, the Unicode Consortium published Unicode 1.0, aiming to unify every character system in the world into a single code point space. Today, Unicode 15.1 defines 149,813 characters across 161 scripts — from ancient Cuneiform to emoji, from mathematical symbols to playing card suits.

In 1993, Ken Thompson and Rob Pike designed UTF-8 — a variable-width encoding that represents Unicode code points as 1 to 4 bytes, with the critical property of backward compatibility with ASCII. Any ASCII text file is a valid UTF-8 file. UTF-8 is now the dominant encoding on the web: per the W3Techs Web Technology Survey (April 2026), 98.3% of all websites use UTF-8 as their character encoding.

Understanding how each character you type maps to binary is not esoteric knowledge — it is the foundation of network protocols, cryptographic hashing, database storage, file formats, and every debugging session that involves inspecting raw bytes.

Key Takeaways

▸Text-to-binary conversion is a two-step process: character → code point (via Unicode) → bytes (via UTF-8) → binary representation.
▸ASCII characters (0–127) are always 1 byte in UTF-8, identical to their ASCII representation. The letter 'A' is always 01000001.
▸UTF-8 uses 1–4 bytes per character. Emoji and CJK characters require 3–4 bytes — they are not 1 byte each.
▸Per W3Techs (April 2026), 98.3% of websites use UTF-8. It is the only encoding you should use for new applications.
▸In JavaScript, TextEncoder / TextDecoder are the standard APIs. In Python, str.encode('utf-8').

The Two-Step Process: Code Points Then Bytes

Text-to-binary conversion requires understanding two distinct concepts that are often conflated: character encoding standards (which number to assign each character) and encoding schemes (how to serialize those numbers as bytes).

Step 1: Character → Code Point

Every character has a Unicode code point — a unique number in the range U+0000 to U+10FFFF:

'H'  →  U+0048  (decimal: 72)
'e'  →  U+0065  (decimal: 101)
'l'  →  U+006C  (decimal: 108)
'!'  →  U+0021  (decimal: 33)
'é'  →  U+00E9  (decimal: 233)
'中' →  U+4E2D  (decimal: 20013)
'😀' →  U+1F600 (decimal: 128512)

Step 2: Code Point → UTF-8 Bytes → Binary

UTF-8 serializes code points to 1–4 bytes, then each byte is a number 0–255 expressed in 8-bit binary:

'H'  →  code point 72  →  1 byte: [0x48]        →  01001000
'é'  →  code point 233 →  2 bytes: [0xC3, 0xA9]  →  11000011 10101001
'中' →  code point 20013 → 3 bytes: [0xE4,0xB8,0xAD] → 11100100 10111000 10101101
'😀' →  code point 128512 → 4 bytes: [0xF0,0x9F,0x98,0x80] → 11110000 10011111 10011000 10000000

The critical insight: "text to binary" is not a single operation. The binary representation of a character depends entirely on the encoding used. A naïve tool that treats each character as a single byte will silently produce wrong output for any non-ASCII character. A correct tool applies UTF-8 encoding before converting to binary.

ASCII Binary Reference: The 128 Characters You Use Every Day

ASCII's 7-bit design means printable characters occupy code points 32–126. Control characters (0–31 and 127) are non-printable. Here is the complete reference for the printable ASCII range — the characters whose binary representations you will encounter most frequently in debugging and protocol analysis:

Char	Dec	Hex	Binary	Char	Dec	Hex	Binary
A	65	41	01000001	a	97	61	01100001
B	66	42	01000010	b	98	62	01100010
C	67	43	01000011	c	99	63	01100011
D	68	44	01000100	d	100	64	01100100
E	69	45	01000101	e	101	65	01100101
F	70	46	01000110	f	102	66	01100110
G	71	47	01000111	g	103	67	01100111
H	72	48	01001000	h	104	68	01101000
I	73	49	01001001	i	105	69	01101001
J	74	4A	01001010	j	106	6A	01101010
K	75	4B	01001011	k	107	6B	01101011
L	76	4C	01001100	l	108	6C	01101100
M	77	4D	01001101	m	109	6D	01101101
N	78	4E	01001110	n	110	6E	01101110
O	79	4F	01001111	o	111	6F	01101111
P	80	50	01010000	p	112	70	01110000
Q	81	51	01010001	q	113	71	01110001
R	82	52	01010010	r	114	72	01110010
S	83	53	01010011	s	115	73	01110011
T	84	54	01010100	t	116	74	01110100
U	85	55	01010101	u	117	75	01110101
V	86	56	01010110	v	118	76	01110110
W	87	57	01010111	w	119	77	01110111
X	88	58	01011000	x	120	78	01111000
Y	89	59	01011001	y	121	79	01111001
Z	90	5A	01011010	z	122	7A	01111010
0	48	30	00110000	5	53	35	00110101
1	49	31	00110001	6	54	36	00110110
2	50	32	00110010	7	55	37	00110111
3	51	33	00110011	8	56	38	00111000
4	52	34	00110100	9	57	39	00111001
	32	20	00100000	.	46	2E	00101110
!	33	21	00100001	/	47	2F	00101111
"	34	22	00100010	:	58	3A	00111010
#	35	23	00100011	;	59	3B	00111011
$	36	24	00100100	=	61	3D	00111101
%	37	25	00100101	?	63	3F	00111111
&	38	26	00100110	@	64	40	01000000
'	39	27	00100111	[	91	5B	01011011
(	40	28	00101000	\	92	5C	01011100
)	41	29	00101001	]	93	5D	01011101
*	42	2A	00101010	^	94	5E	01011110
+	43	2B	00101011	_	95	5F	01011111
,	44	2C	00101100	`	96	60	01100000
-	45	2D	00101101	{	123	7B	01111011

The Case Bit Pattern — One of ASCII's Clever Design Decisions

Uppercase A = 01000001 (65) Lowercase a = 01100001 (97). The only difference is bit 5 (the 32-bit position). Toggling bit 5 switches between uppercase and lowercase for all 26 letters — which is why case-insensitive comparison in ASCII contexts is just a bitwise AND with 0xDF, and case conversion is a bitwise XOR with 0x20.

UTF-8 Multi-Byte Encoding: How It Actually Works

UTF-8's variable-width design uses a clever self-synchronizing byte structure. You can identify the role of any byte by its leading bits — no context needed:

Code Point Range	Bytes	Byte Pattern	Data Bits	Examples
U+0000 – U+007F	1	0xxxxxxx	7 bits	ASCII: A, B, 0–9
U+0080 – U+07FF	2	110xxxxx 10xxxxxx	11 bits	é, ü, ñ, £, Arabic
U+0800 – U+FFFF	3	1110xxxx 10xxxxxx 10xxxxxx	16 bits	CJK, Hebrew, Korean
U+10000 – U+10FFFF	4	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx	21 bits	Emoji, historic scripts

The 10xxxxxx continuation byte pattern is why UTF-8 is self-synchronizing: if you drop into a byte stream mid-sequence, you can identify continuation bytes (starting with 10) and skip backward to the sequence start byte. This property is absent in older fixed-width encodings like UTF-16 and UTF-32, which is one of the reasons UTF-8 dominated.

Working Through 'é' (U+00E9) Step by Step

Code point: U+00E9 = 233 decimal = 0xE9

Step 1: Identify byte count.
  233 is in range U+0080–U+07FF → 2 bytes
  Pattern: 110xxxxx 10xxxxxx (11 data bits available)

Step 2: Convert 233 to binary:
  233 = 11101001 (8 bits)

Step 3: Fit bits into pattern (11 data bits, use right-to-right):
  11101001 → 00011101001 (pad to 11 bits)
  Split: 00011 | 101001
  First byte:  110 + 00011 = 11000011 = 0xC3
  Second byte: 10  + 101001 = 10101001 = 0xA9

Result: 0xC3 0xA9 → binary: 11000011 10101001

Text to Binary: "Hello, World!" Character by Character

All characters in "Hello, World!" are ASCII (U+0000–U+007F), so each is exactly 1 byte in UTF-8:

Char	Unicode	Decimal	Hex	Binary (8-bit)
H	U+0048	72	0x48	01001000
e	U+0065	101	0x65	01100101
l	U+006C	108	0x6C	01101100
l	U+006C	108	0x6C	01101100
o	U+006F	111	0x6F	01101111
,	U+002C	44	0x2C	00101100
	U+0020	32	0x20	00100000
W	U+0057	87	0x57	01010111
o	U+006F	111	0x6F	01101111
r	U+0072	114	0x72	01110010
l	U+006C	108	0x6C	01101100
d	U+0064	100	0x64	01100100
!	U+0021	33	0x21	00100001

Full binary output (space-separated by character):

01001000 01100101 01101100 01101100 01101111 00101100 00100000 01010111 01101111 01110010 01101100 01100100 00100001

Text to Binary in Code: JavaScript, Python, Go

JavaScript — Browser and Node.js

Browser-compatible using TextEncoder (WHATWG standard)

// Text to binary (UTF-8 aware):
function textToBinary(text) {
  const encoder = new TextEncoder() // Always UTF-8
  const bytes = encoder.encode(text)
  return [...bytes]
    .map(byte => byte.toString(2).padStart(8, '0'))
    .join(' ')
}

console.log(textToBinary('Hello'))
// => "01001000 01100101 01101100 01101100 01101111"

console.log(textToBinary('é'))
// => "11000011 10101001"  (2 bytes for U+00E9)

console.log(textToBinary('😀'))
// => "11110000 10011111 10011000 10000000"  (4 bytes)

// Binary back to text:
function binaryToText(binaryString) {
  const bytes = binaryString.trim().split(/s+/)
    .map(b => parseInt(b, 2))
  const decoder = new TextDecoder('utf-8')
  return decoder.decode(new Uint8Array(bytes))
}

console.log(binaryToText('01001000 01100101 01101100 01101100 01101111'))
// => "Hello"

Python

Python 3 — UTF-8 encoding built-in

def text_to_binary(text: str, encoding: str = 'utf-8') -> str:
    """Convert text to space-separated binary string."""
    return ' '.join(format(byte, '08b') for byte in text.encode(encoding))

def binary_to_text(binary_string: str, encoding: str = 'utf-8') -> str:
    """Convert space-separated binary string back to text."""
    byte_strings = binary_string.strip().split()
    byte_values = [int(b, 2) for b in byte_strings]
    return bytes(byte_values).decode(encoding)

# Examples:
print(text_to_binary('Hello'))
# 01001000 01100101 01101100 01101100 01101111

print(text_to_binary('é'))
# 11000011 10101001

print(text_to_binary('中'))
# 11100100 10111000 10101101

# Round-trip verification:
original = 'Hello, World! 🌍'
binary = text_to_binary(original)
recovered = binary_to_text(binary)
assert original == recovered, "Round-trip failed"
print(f"Round-trip verified: {len(binary.split())} bytes")

Go

Go — strings are UTF-8 natively

package main

import (
    "fmt"
    "strconv"
    "strings"
)

func textToBinary(s string) string {
    // Go strings are UTF-8 byte slices — iterate bytes directly
    parts := make([]string, 0, len(s))
    for _, b := range []byte(s) {
        parts = append(parts, fmt.Sprintf("%08b", b))
    }
    return strings.Join(parts, " ")
}

func binaryToText(binary string) (string, error) {
    parts := strings.Fields(binary)
    bytes := make([]byte, len(parts))
    for i, part := range parts {
        val, err := strconv.ParseUint(part, 2, 8)
        if err != nil {
            return "", fmt.Errorf("invalid binary at position %d: %w", i, err)
        }
        bytes[i] = byte(val)
    }
    return string(bytes), nil
}

func main() {
    fmt.Println(textToBinary("Hello"))
    // 01001000 01100101 01101100 01101100 01101111

    text, _ := binaryToText("01001000 01100101 01101100 01101100 01101111")
    fmt.Println(text) // Hello
}

Four Common Mistakes in Text-to-Binary Conversion

1. Treating Characters as Single Bytes

The most common error: using charCodeAt() in JavaScript and treating the result as a complete UTF-8 byte. 'é'.charCodeAt(0) returns 233 — the Unicode code point — not the UTF-8 encoding. For ASCII characters this coincidentally produces the right answer, so the bug hides until a non-ASCII character appears. Always use TextEncoder or equivalent.

2. Omitting the Leading Zero Pad

The space character (decimal 32) in binary is 00100000. Without the leading zeros, it becomes 100000 — only 6 bits. This makes it impossible to reconstruct the original byte stream unambiguously. Always pad to 8 bits: .padStart(8, '0') in JS, format(byte, '08b') in Python.

3. Confusing UTF-16 Surrogate Pairs

JavaScript strings are UTF-16 internally. Characters above U+FFFF (like emoji) are stored as two 16-bit "surrogate pairs." '😀'.length returns 2 in JavaScript, not 1, because the engine counts UTF-16 code units. Using TextEncoder correctly handles this — it converts the surrogate pair to the correct 4-byte UTF-8 sequence. Processing charCodeAt() on each character position produces surrogates, not valid UTF-8.

4. Incorrect Separator Assumptions During Decode

Binary strings are often space-separated (one 8-bit group per character) but sometimes presented as continuous strings with no separators. When decoding, you must know the format: 01001000... (continuous) vs 01001000 01100101... (space-separated). Continuous binary should be split into 8-character chunks: binaryStr.match(/.{1,8}/g).

Real-World Applications of Binary Text Representation

Understanding text-as-binary is not an academic exercise. These are the actual software contexts where you will encounter and need to manipulate binary text:

Network Protocol Debugging

TCP/IP packets carry bytes, not characters. Protocol analyzers like Wireshark display packet payloads in hexadecimal — reading HTTP headers, WebSocket frames, or DNS records at the byte level requires knowing which byte sequences correspond to which ASCII characters. A GET request starts with bytes 0x47 0x45 0x54 (71, 69, 84 — 'G', 'E', 'T' in ASCII).

Cryptographic Hash Verification

Cryptographic functions like SHA-256 operate on byte arrays, not strings. The hash of "Hello" and the hash of "Hello" (with a trailing newline) differ — which is why encoding matters. A security audit of hash-based verification code requires tracing exactly what bytes were fed to the hash function, which means tracing the UTF-8 encoding of the input string. See BytePane's hash functions explained guide for the full picture.

File Format Parsing

Binary file formats begin with "magic bytes" — fixed byte sequences that identify the file type. PDF files start with 25 50 44 46 hex (%PDF in ASCII). PNG files start with 89 50 4E 47 (‰PNG). Validating file uploads at the byte level rather than trusting file extensions is a security best practice — and it requires knowing the binary representation of ASCII characters.

Base64 Encoding

Base64 operates on the byte-level representation of text: it takes the UTF-8 bytes of a string and re-encodes them using only 64 ASCII-safe characters. "Hello" encodes to "SGVsbG8=" in Base64 — directly from the bytes 72, 101, 108, 108, 111. Understanding the UTF-8 byte layer clarifies why the same string in different encodings produces different Base64 output. For Base64 encoding/decoding tools and mechanics, see BytePane's Base64 encoding guide.

Embedded Systems and Hardware Registers

Microcontroller communication protocols (UART, SPI, I²C) transmit bytes. Sending a string over a serial port means writing each character's byte value to a hardware register. Embedded developers work in binary and hexadecimal constantly — knowing that ASCII 'A' is 0x41 is as fundamental as knowing that 'A' is the first letter.

Binary vs Hexadecimal vs Decimal: Three Views of the Same Byte

Binary, hexadecimal, and decimal are three notations for the same underlying byte values. Each has its use case:

Notation	Base	Value of 'A'	Best For
Binary	2	01000001	Bit manipulation, protocol flags, embedded systems
Hexadecimal	16	0x41	Memory dumps, network packets, color codes, byte inspection
Decimal	10	65	Human-readable code point reference, ASCII tables
Octal	8	0101	Unix file permissions, legacy systems (now mostly hex)

The conversion between binary and hexadecimal is exact: every 4 binary bits correspond to one hex digit. 01000001 splits as 0100 (= 4 = hex 4) + 0001 (= 1 = hex 1) = 0x41. This grouping is why developers reach for hex over binary for most debugging — it is 4× more compact while preserving the exact bit pattern. If you need to convert between hexadecimal and decimal or binary programmatically, BytePane's hex to decimal converter guide covers the full conversion math.

Frequently Asked Questions

How does text to binary conversion work?

Text-to-binary conversion is a two-step process. First, each character maps to a numeric code point using Unicode — ASCII uses 7-bit code points (0–127), while Unicode defines over 1.1 million code points. Second, those code points serialize to bytes via UTF-8 encoding, where each byte is expressed as an 8-bit binary sequence. The letter 'A' → code point 65 → binary 01000001.

What is the binary for the letter 'A'?

The uppercase letter A has ASCII code point 65 (decimal). In binary, 65 = 01000001. This is also its UTF-8 representation since ASCII characters use the single-byte range of UTF-8. Lowercase 'a' is code point 97 = 01100001. The difference between uppercase and lowercase ASCII letters is always exactly bit 5 (the 32-bit position) — toggling it switches case.

What is the difference between ASCII and UTF-8 binary?

ASCII defines 128 characters (code points 0–127), each stored as a 7-bit value. UTF-8 is backward-compatible with ASCII for these 128 characters — their binary representations are identical. For code points 128 and above (accented characters, emoji, CJK), UTF-8 uses 2–4 bytes with leading continuation bits. Continuation bytes always start with the bit pattern 10xxxxxx.

How do I convert text to binary in JavaScript?

Use TextEncoder to get UTF-8 bytes, then convert each byte to binary: const encoder = new TextEncoder(); const bytes = encoder.encode('Hello'); const binary = [...bytes].map(b => b.toString(2).padStart(8, '0')).join(' '). The padStart(8,'0') is critical — it ensures each byte is represented as exactly 8 bits, including leading zeros for small values like space (00100000).

What does binary text look like for an emoji?

Emoji require 4 bytes in UTF-8. The 😀 emoji (U+1F600) encodes to bytes [0xF0, 0x9F, 0x98, 0x80]. In binary: 11110000 10011111 10011000 10000000. The first byte starts with 11110 (four 1s then a 0) indicating a 4-byte sequence. The remaining 3 bytes start with 10, marking them as UTF-8 continuation bytes.

Is binary the same as hexadecimal?

No, but they represent the same underlying byte values in different bases. Binary is base 2 (0s and 1s), hexadecimal is base 16 (0–9 and A–F). Each hex digit represents exactly 4 binary bits: hex F = binary 1111, hex 4 = 0100. An 8-bit binary byte is 2 hex digits: 01000001 = 0x41 = 65 decimal = ASCII 'A'. Hexadecimal is 4× more compact than binary.

What is binary text used for in real software?

Binary representation of text is fundamental to: network protocol debugging (inspecting raw TCP/IP packets in Wireshark), cryptography (verifying exact byte inputs to hash functions), file format parsing (identifying magic bytes at file headers), Base64 encoding (which operates on UTF-8 byte sequences), and embedded systems programming (UART/SPI communication transmits bytes, not characters).

How do I decode binary back to text?

Decoding reverses encoding: split the binary string into 8-bit groups, parse each group with parseInt(group, 2) to get byte values, then decode the bytes as UTF-8 using TextDecoder in JavaScript or bytes().decode('utf-8') in Python. For multi-byte UTF-8 sequences, TextDecoder handles grouping automatically — you must not attempt to decode each byte as a separate character.

Convert Text to Binary Instantly

Need to quickly see the binary representation of any text? BytePane's Base64 encoder shows the underlying byte encoding — or use our binary converter tool for direct text-to-binary output. Everything runs in your browser.

Open Base64 & Encoding Tool →