Learn Unicode
Interactive guides with live examples. Each article links to the Unicode Viewer tool so you can explore the concepts hands-on.
Fundamentals
Characters Are a Lie: Understanding Grapheme Clusters
Why string.length gives wrong answers, what grapheme clusters really are, and how Intl.Segmenter fixes everything.
UTF-8 Byte by Byte: How Characters Become Bytes
A visual, byte-level walkthrough of UTF-8 encoding showing exactly how code points map to 1-4 bytes.
Unicode Normalization: NFC, NFD, NFKC, NFKD Demystified
Why the same-looking text can have different bytes, when each normalization form matters, and how to see the differences visually.
Surrogate Pairs: Why JavaScript Strings Break on Emoji
How UTF-16 surrogate pairs work, why they affect JavaScript/Java/C#, and how to handle them correctly.
Encoding & Legacy
Shift_JIS vs CP932: The Encoding Everyone Confuses
The precise technical differences between Shift_JIS and CP932 (Windows-31J), with byte-level evidence.
The Wave Dash Problem: γ vs ο½ and 7 Other Mapping Conflicts
Complete reference on the 7 JIS-Unicode mapping discrepancies with an interactive toggle to see both variants.
Legacy Encoding Survival Guide: From ASCII to GB18030
A practical overview of 20+ character encodings across languages, how they relate, and how to identify them.
CJK
Han Unification: How Unicode Merged 100,000 CJK Characters
How the IRG decided which characters from Japan, China, Taiwan, and Korea are 'the same,' with a tool to check any character's source.
IVS: How Unicode Represents 47 Versions of the Same Kanji
Understanding Ideographic Variation Sequences and Standardized Variation Sequences, with live font rendering of all registered variants.
Why One Font Isn't Enough: CJK Variant Coverage Across Fonts
How different CJK fonts implement different IVD collections, why a single font can't show every registered variant, and how this site combines three fonts to render every IVS faithfully.
JIS Levels and Kuten Codes: Japan's Character Classification System
How Japan classifies kanji into 4 levels across JIS X 0208 and JIS X 0213, with kuten positional codes.
Security & Edge Cases
Unicode Homoglyph Attacks: When Characters Lie About Who They Are
How visually identical characters from different scripts enable phishing and spoofing, and how to detect them.
Invisible Characters: Zero-Width Spaces, Bidi Overrides, and Hidden Text
A catalog of invisible Unicode characters that can break or hide in text, with the tool to reveal them.
Emoji Under the Hood: ZWJ Sequences, Skin Tones, and Flag Math
How complex emoji are built from multiple code points using ZWJ, variation selectors, and regional indicators.
WHATWG vs Unicode.org: Why Browsers and Standards Disagree on Encoding
A cross-encoding survey of mapping discrepancies between web standards and official Unicode/national standards.