Invisible Characters: Zero-Width Spaces, Bidi Overrides, and Hidden Text
A catalog of invisible Unicode characters that can break or hide in text, with the tool to reveal them.
Zero-Width Characters: ZWSP, ZWJ, and ZWNJ
Unicode includes several characters that occupy zero width — they are present in the text data but produce no visible glyph. The three most important are:
| Character | Code point | Name | Purpose |
|---|---|---|---|
| (invisible) | U+200B | Zero Width Space (ZWSP) | Optional line-break opportunity |
| (invisible) | U+200D | Zero Width Joiner (ZWJ) | Joins adjacent characters into ligatures/sequences |
| (invisible) | U+200C | Zero Width Non-Joiner (ZWNJ) | Prevents joining that would otherwise occur |
ZWSP (U+200B) is used to indicate where a line break may occur in scripts that do not use spaces between words, such as Thai, Khmer, and CJK text. It is also frequently (mis)used as an “invisible space” in usernames and messages.
ZWJ (U+200D) is the glue behind complex emoji sequences. The family emoji 👨👩👧👦 is literally Man + ZWJ + Woman + ZWJ + Girl + ZWJ + Boy. It is also essential in scripts like Devanagari where it controls consonant conjunct formation.
ZWNJ (U+200C) does the opposite: it prevents characters from joining. In Persian and Arabic, it is used to show the non-joining form of a letter mid-word, which changes meaning in some cases.
Bidi Overrides: Invisible Text Direction Control
Unicode supports bidirectional text (for mixing left-to-right scripts like English with right-to-left scripts like Arabic). This requires invisible control characters:
| Character | Code point | Name | Effect |
|---|---|---|---|
| (invisible) | U+200E | Left-to-Right Mark (LRM) | Forces LTR direction |
| (invisible) | U+200F | Right-to-Left Mark (RLM) | Forces RTL direction |
| (invisible) | U+202A | Left-to-Right Embedding (LRE) | Starts LTR embedding |
| (invisible) | U+202B | Right-to-Left Embedding (RLE) | Starts RTL embedding |
| (invisible) | U+202C | Pop Directional Formatting (PDF) | Ends embedding |
| (invisible) | U+202D | Left-to-Right Override (LRO) | Forces all text LTR |
| (invisible) | U+202E | Right-to-Left Override (RLO) | Forces all text RTL |
| (invisible) | U+2066 | Left-to-Right Isolate (LRI) | Isolates LTR text |
| (invisible) | U+2067 | Right-to-Left Isolate (RLI) | Isolates RTL text |
| (invisible) | U+2069 | Pop Directional Isolate (PDI) | Ends isolation |
The Right-to-Left Override (U+202E) is particularly dangerous. It forces all subsequent text to render right-to-left, which can make filenames, code, and URLs appear to say something completely different from their actual content:
// Normal text: "hello.txt" // With RLO inserted: "\u202Ehello.txt" // Renders as: txt.olleh // A file named "\u202Efdp.exe" could display as "exe.pdf"!
Tag Characters: An Entire Hidden Alphabet
Unicode block U+E0000–U+E007F contains Tag characters — invisible versions of ASCII characters originally intended for language tagging. These were deprecated for that purpose but later repurposed for emoji flag subdivision sequences (like the flag of Scotland: 🏴).
| Tag character | Code point | Corresponds to |
|---|---|---|
| TAG LATIN SMALL LETTER A | U+E0061 | a |
| TAG LATIN SMALL LETTER B | U+E0062 | b |
| TAG DIGIT ZERO | U+E0030 | 0 |
| CANCEL TAG | U+E007F | (terminates sequence) |
The Scotland flag emoji is: 🏴 + TAG g + TAG b + TAG s + TAG c + TAG t + CANCEL TAG. That is 7 code points (14 UTF-16 code units) for one flag emoji, with 6 invisible characters.
Tag characters can be abused to hide arbitrary text within seemingly innocent strings. Since they are invisible and most tools do not display them, they can carry hidden messages or watermarks.
// The Scotland flag decomposed: "🏴" // = U+1F3F4 (black flag) // + U+E0067 (tag g) // + U+E0062 (tag b) // + U+E0073 (tag s) // + U+E0063 (tag c) // + U+E0074 (tag t) // + U+E007F (cancel tag) [..."🏴"].length // 14 code points (surrogates)
The Space Zoo: 18 Different Space Characters
Beyond the regular space (U+0020) and the zero-width space, Unicode contains a menagerie of space characters with different widths:
| Name | Code point | Width |
|---|---|---|
| SPACE | U+0020 | Normal word space |
| NO-BREAK SPACE | U+00A0 | Same as space, prevents line break |
| EN QUAD | U+2000 | Width of an en (half em) |
| EM QUAD | U+2001 | Width of an em |
| EN SPACE | U+2002 | Width of an en |
| EM SPACE | U+2003 | Width of an em |
| THREE-PER-EM SPACE | U+2004 | 1/3 em |
| FOUR-PER-EM SPACE | U+2005 | 1/4 em |
| SIX-PER-EM SPACE | U+2006 | 1/6 em |
| FIGURE SPACE | U+2007 | Width of a digit |
| PUNCTUATION SPACE | U+2008 | Width of a period |
| THIN SPACE | U+2009 | 1/5 em (approximately) |
| HAIR SPACE | U+200A | Very thin space |
| ZERO WIDTH SPACE | U+200B | No width |
| NARROW NO-BREAK SPACE | U+202F | Narrow, no line break |
| MEDIUM MATHEMATICAL SPACE | U+205F | 4/18 em |
| IDEOGRAPHIC SPACE | U+3000 | CJK fullwidth space |
| OGHAM SPACE MARK | U+1680 | Ogham word separator |
The no-break space (U+00A0) is the most commonly encountered problem space. It looks identical to a regular space but prevents line breaks. It often appears when copying text from PDFs, Word documents, or web pages, and can cause string comparisons to fail silently.
Practical Impact: Where Invisible Characters Cause Bugs
Invisible characters cause real problems in software:
- String comparison failures:
"hello" === "hello"can be false if one contains a hidden ZWSP, BOM, or non-breaking space. - JSON/YAML parsing errors: A BOM (U+FEFF) at the start of a file can break parsers. A ZWSP in a key name makes it unmatchable.
- URL manipulation: Invisible characters in URLs can bypass security filters while appearing legitimate to users.
- Password fields: Copy-pasting a password with an invisible character means the user “knows” their password but it never matches.
- Code bugs: A ZWNJ or ZWJ in a variable name creates a different identifier:
priceandprice(with hidden ZWJ) are two different variables.
// Common invisible character detection:
function hasInvisible(str) {
const invisible = /[\u200B-\u200F\u2028-\u202F\u2060-\u206F\uFEFF]/;
return invisible.test(str);
}
// Strip common invisible characters:
function stripInvisible(str) {
return str.replace(
/[\u200B-\u200F\u2028-\u202F\u2060-\u206F\uFEFF]/g,
""
);
}
// Example:
const text = "hello\u200Bworld";
text.length // 11 (not 10!)
hasInvisible(text) // true
stripInvisible(text) // "helloworld"How This Tool Reveals Them
The fundamental problem with invisible characters is that they are, by design, invisible. Standard text editors, terminals, and web browsers will not show them. You need a specialized tool to detect their presence.
This tool solves the problem by:
- Showing every code point: Each code point gets its own cell in the grid, including invisible ones. You can see their Unicode name, code point value, and general category.
- Labeling control characters: Zero-width characters, bidi controls, and other invisible characters are shown with their abbreviated names so you can identify them instantly.
- Grapheme cluster awareness: When an invisible character combines with visible ones (like ZWJ in emoji), the tool shows the full cluster structure.
If you ever encounter text that behaves unexpectedly — comparisons fail, lengths are wrong, or copy-paste produces different results — paste it into this tool to see what is really there.
Related articles
Unicode Homoglyph Attacks: When Characters Lie About Who They Are
How visually identical characters from different scripts enable phishing and spoofing, and how to detect them.
Emoji Under the Hood: ZWJ Sequences, Skin Tones, and Flag Math
How complex emoji are built from multiple code points using ZWJ, variation selectors, and regional indicators.
WHATWG vs Unicode.org: Why Browsers and Standards Disagree on Encoding
A cross-encoding survey of mapping discrepancies between web standards and official Unicode/national standards.