โœ๏ธ

IVS: How Unicode Represents 47 Versions of the Same Kanji

Understanding Ideographic Variation Sequences and Standardized Variation Sequences, with live font rendering of all registered variants.

The Problem: One Code Point, Many Shapes

Han Unification merged characters that share the same origin into single code points. But what happens when you need to specify an exact glyph variant? Japanese names, historical documents, and calligraphic traditions demand precise glyph control beyond what a font's default rendering provides.

For example, the character ่พป (U+8FBB, โ€œtsujiโ€, a common Japanese surname) has two accepted forms: one with one dot on the left radical (ไธ€็‚นใ—ใ‚“ใซใ‚‡ใ†) and one with two dots (ไบŒ็‚นใ—ใ‚“ใซใ‚‡ใ†). Both are โ€œcorrectโ€ โ€” but which one appears depends on the font, and there is no way to choose using the base code point alone.

Unicode's solution is Variation Sequences: a base character followed by a special variation selector character that specifies the exact glyph form.

How IVS Works: The E0100 Range

Ideographic Variation Sequences (IVS) use variation selectors from the range U+E0100 through U+E01EF (240 selectors, called VS17 through VS256). An IVS is a two-character sequence:

Base character + Variation Selector = IVS

Example:
่‘› (U+845B) + VS17 (U+E0100) = ่‘›๓ „€ (specific variant)
่‘› (U+845B) alone            = ่‘›  (default glyph)

The variation selector is invisible โ€” it produces no glyph of its own. But a font that supports IVS will render a different glyph when it encounters the sequence.

ComponentCode pointVisible?
Base character: ่‘›U+845BYes
Variation selector: VS17U+E0100No (invisible)
Sequence: ่‘›๓ „€U+845B U+E0100Yes (variant glyph)

In JavaScript, each variation selector in this range requires a surrogate pair (2 UTF-16 code units), so an IVS takes 3โ€“4 code units total despite being one grapheme cluster.

SVS: The Emoji and Symbol Variation Selectors

Standardized Variation Sequences (SVS) use a different, smaller set of variation selectors: U+FE00 through U+FE0F (VS1 through VS16). These are used for:

SelectorCommon useExample
VS1 (U+FE00)CJK compatibility variants่Šฆ + VS1 for specific form
VS15 (U+FE0E)Text presentationโ˜บ๏ธŽ (text style)
VS16 (U+FE0F)Emoji presentationโ˜บ๏ธ (emoji style)

The most widely known SVS usage is the text/emoji toggle. Many characters have both a text presentation (monochrome, simple) and an emoji presentation (colorful). VS15 forces text style, VS16 forces emoji style:

// Same base character, different presentations:
"\u2764"           // โค (default, usually emoji)
"\u2764\uFE0E"    // โค๏ธŽ (text presentation, VS15)
"\u2764\uFE0F"    // โค๏ธ (emoji presentation, VS16)

// The selectors are invisible but change rendering:
"โค๏ธ".length  // 2 (base + VS16, both in BMP)

Unlike IVS selectors (which are in the SMP and need surrogates), SVS selectors are in the BMP (U+FE00โ€“FE0F) and each take just one UTF-16 code unit.

IVD Collections: Adobe-Japan1 and Moji_Joho

Which variation selectors map to which glyphs is not arbitrary โ€” it is recorded in the Ideographic Variation Database (IVD), maintained by the Unicode Consortium. The IVD contains named collections:

CollectionScopeEntries
Adobe-Japan1Japanese typography (AJ1 CID)~14,700
Moji_JohoJapanese government character info~11,000
Hanyo-DenshiJapanese administrative systems~11,000
KRNameKorean personal name variants~2,200

Adobe-Japan1 is the most widely supported collection. It maps IVS sequences to specific CID (Character ID) numbers in the Adobe-Japan1-7 character collection, which professional Japanese fonts implement. A font that supports Adobe-Japan1 IVS can render thousands of glyph variants.

Moji_Joho (ๆ–‡ๅญ—ๆƒ…ๅ ฑ) is maintained by Japan's Information-technology Promotion Agency (IPA) and focuses on character variants used in official government documents and the family register system (ๆˆธ็ฑ).

Font Support: When IVS Actually Works

IVS only works if the font supports it. A font must contain:

  • The glyph variants for each supported IVS sequence
  • A cmap table (specifically format 14, Unicode Variation Sequences) that maps base+selector pairs to glyphs

Major fonts with IVS support include:

FontCollectionPlatform
IPAmj MinchoMoji_JohoCross-platform (free)
Noto Sans CJKAdobe-Japan1 (partial)Cross-platform (free)
Kozuka MinchoAdobe-Japan1Adobe products
Yu MinchoAdobe-Japan1 (partial)Windows / macOS
Hiragino MinchoAdobe-Japan1 (partial)macOS

If a font does not support a particular IVS, it simply renders the base character's default glyph and ignores the variation selector. This is a graceful fallback โ€” the text remains legible, just not in the specific variant requested.

The Record Holder: ้‚‰ and Its 47 Variants

The character ้‚‰ (U+9089) holds the record for the most registered IVS sequences. It has approximately 47 variant forms in the Moji_Joho collection, reflecting the many ways this character has been written in Japanese family registers over the centuries.

The surname ๆธก้‚‰ (Watanabe) is notorious in Japan for having dozens of variant spellings. Municipal offices maintaining family registers need to faithfully reproduce the exact variant used in each family's records, which is why the Moji_Joho collection registers so many forms.

CharacterIVS variants (Moji_Joho)Typical use
้‚‰ U+9089~47ๆธก้‚‰ surname variants
้‚Š U+908A~30ๆธก้‚Š surname variants
่พบ U+8FBA~10ๆธก่พบ surname variants
่‘› U+845B~8Place names (่‘›้ฃพ etc.)

This is a case where IVS is essential: without it, government systems could not accurately record the legally distinct name variants that Japanese law requires preserving.

// The ้‚‰ character with different IVS:
"้‚‰"                    // Default glyph
"้‚‰\u{E0100}"          // Variant 1 (VS17)
"้‚‰\u{E0101}"          // Variant 2 (VS18)
// ... up to ~47 registered variants

// Each is one grapheme cluster:
const seg = new Intl.Segmenter();
[...seg.segment("้‚‰\u{E0100}")].length  // 1

Related articles