👻

Invisible Characters: Zero-Width Spaces, Bidi Overrides, and Hidden Text

A catalog of invisible Unicode characters that can break or hide in text, with the tool to reveal them.

Zero-Width Characters: ZWSP, ZWJ, and ZWNJ

Unicode includes several characters that occupy zero width — they are present in the text data but produce no visible glyph. The three most important are:

CharacterCode pointNamePurpose
(invisible)U+200BZero Width Space (ZWSP)Optional line-break opportunity
(invisible)U+200DZero Width Joiner (ZWJ)Joins adjacent characters into ligatures/sequences
(invisible)U+200CZero Width Non-Joiner (ZWNJ)Prevents joining that would otherwise occur

ZWSP (U+200B) is used to indicate where a line break may occur in scripts that do not use spaces between words, such as Thai, Khmer, and CJK text. It is also frequently (mis)used as an “invisible space” in usernames and messages.

ZWJ (U+200D) is the glue behind complex emoji sequences. The family emoji 👨‍👩‍👧‍👦 is literally Man + ZWJ + Woman + ZWJ + Girl + ZWJ + Boy. It is also essential in scripts like Devanagari where it controls consonant conjunct formation.

ZWNJ (U+200C) does the opposite: it prevents characters from joining. In Persian and Arabic, it is used to show the non-joining form of a letter mid-word, which changes meaning in some cases.

Bidi Overrides: Invisible Text Direction Control

Unicode supports bidirectional text (for mixing left-to-right scripts like English with right-to-left scripts like Arabic). This requires invisible control characters:

CharacterCode pointNameEffect
(invisible)U+200ELeft-to-Right Mark (LRM)Forces LTR direction
(invisible)U+200FRight-to-Left Mark (RLM)Forces RTL direction
(invisible)U+202ALeft-to-Right Embedding (LRE)Starts LTR embedding
(invisible)U+202BRight-to-Left Embedding (RLE)Starts RTL embedding
(invisible)U+202CPop Directional Formatting (PDF)Ends embedding
(invisible)U+202DLeft-to-Right Override (LRO)Forces all text LTR
(invisible)U+202ERight-to-Left Override (RLO)Forces all text RTL
(invisible)U+2066Left-to-Right Isolate (LRI)Isolates LTR text
(invisible)U+2067Right-to-Left Isolate (RLI)Isolates RTL text
(invisible)U+2069Pop Directional Isolate (PDI)Ends isolation

The Right-to-Left Override (U+202E) is particularly dangerous. It forces all subsequent text to render right-to-left, which can make filenames, code, and URLs appear to say something completely different from their actual content:

// Normal text:
"hello.txt"

// With RLO inserted:
"\u202Ehello.txt"
// Renders as: txt.olleh
// A file named "\u202Efdp.exe" could display as "exe.pdf"!

Tag Characters: An Entire Hidden Alphabet

Unicode block U+E0000–U+E007F contains Tag characters — invisible versions of ASCII characters originally intended for language tagging. These were deprecated for that purpose but later repurposed for emoji flag subdivision sequences (like the flag of Scotland: 🏴󠁧󠁢󠁳󠁣󠁴󠁿).

Tag characterCode pointCorresponds to
TAG LATIN SMALL LETTER AU+E0061a
TAG LATIN SMALL LETTER BU+E0062b
TAG DIGIT ZEROU+E00300
CANCEL TAGU+E007F(terminates sequence)

The Scotland flag emoji is: 🏴 + TAG g + TAG b + TAG s + TAG c + TAG t + CANCEL TAG. That is 7 code points (14 UTF-16 code units) for one flag emoji, with 6 invisible characters.

Tag characters can be abused to hide arbitrary text within seemingly innocent strings. Since they are invisible and most tools do not display them, they can carry hidden messages or watermarks.

// The Scotland flag decomposed:
"🏴󠁧󠁢󠁳󠁣󠁴󠁿"
// = U+1F3F4 (black flag)
// + U+E0067 (tag g)
// + U+E0062 (tag b)
// + U+E0073 (tag s)
// + U+E0063 (tag c)
// + U+E0074 (tag t)
// + U+E007F (cancel tag)

[..."🏴󠁧󠁢󠁳󠁣󠁴󠁿"].length  // 14 code points (surrogates)

The Space Zoo: 18 Different Space Characters

Beyond the regular space (U+0020) and the zero-width space, Unicode contains a menagerie of space characters with different widths:

NameCode pointWidth
SPACEU+0020Normal word space
NO-BREAK SPACEU+00A0Same as space, prevents line break
EN QUADU+2000Width of an en (half em)
EM QUADU+2001Width of an em
EN SPACEU+2002Width of an en
EM SPACEU+2003Width of an em
THREE-PER-EM SPACEU+20041/3 em
FOUR-PER-EM SPACEU+20051/4 em
SIX-PER-EM SPACEU+20061/6 em
FIGURE SPACEU+2007Width of a digit
PUNCTUATION SPACEU+2008Width of a period
THIN SPACEU+20091/5 em (approximately)
HAIR SPACEU+200AVery thin space
ZERO WIDTH SPACEU+200BNo width
NARROW NO-BREAK SPACEU+202FNarrow, no line break
MEDIUM MATHEMATICAL SPACEU+205F4/18 em
IDEOGRAPHIC SPACEU+3000CJK fullwidth space
OGHAM SPACE MARKU+1680Ogham word separator

The no-break space (U+00A0) is the most commonly encountered problem space. It looks identical to a regular space but prevents line breaks. It often appears when copying text from PDFs, Word documents, or web pages, and can cause string comparisons to fail silently.

Practical Impact: Where Invisible Characters Cause Bugs

Invisible characters cause real problems in software:

  • String comparison failures: "hello" === "hello" can be false if one contains a hidden ZWSP, BOM, or non-breaking space.
  • JSON/YAML parsing errors: A BOM (U+FEFF) at the start of a file can break parsers. A ZWSP in a key name makes it unmatchable.
  • URL manipulation: Invisible characters in URLs can bypass security filters while appearing legitimate to users.
  • Password fields: Copy-pasting a password with an invisible character means the user “knows” their password but it never matches.
  • Code bugs: A ZWNJ or ZWJ in a variable name creates a different identifier: price and pri‍ce (with hidden ZWJ) are two different variables.
// Common invisible character detection:
function hasInvisible(str) {
  const invisible = /[\u200B-\u200F\u2028-\u202F\u2060-\u206F\uFEFF]/;
  return invisible.test(str);
}

// Strip common invisible characters:
function stripInvisible(str) {
  return str.replace(
    /[\u200B-\u200F\u2028-\u202F\u2060-\u206F\uFEFF]/g,
    ""
  );
}

// Example:
const text = "hello\u200Bworld";
text.length          // 11 (not 10!)
hasInvisible(text)   // true
stripInvisible(text) // "helloworld"

How This Tool Reveals Them

The fundamental problem with invisible characters is that they are, by design, invisible. Standard text editors, terminals, and web browsers will not show them. You need a specialized tool to detect their presence.

This tool solves the problem by:

  • Showing every code point: Each code point gets its own cell in the grid, including invisible ones. You can see their Unicode name, code point value, and general category.
  • Labeling control characters: Zero-width characters, bidi controls, and other invisible characters are shown with their abbreviated names so you can identify them instantly.
  • Grapheme cluster awareness: When an invisible character combines with visible ones (like ZWJ in emoji), the tool shows the full cluster structure.

If you ever encounter text that behaves unexpectedly — comparisons fail, lengths are wrong, or copy-paste produces different results — paste it into this tool to see what is really there.

Related articles