IVS: How Unicode Represents 47 Versions of the Same Kanji
Understanding Ideographic Variation Sequences and Standardized Variation Sequences, with live font rendering of all registered variants.
The Problem: One Code Point, Many Shapes
Han Unification merged characters that share the same origin into single code points. But what happens when you need to specify an exact glyph variant? Japanese names, historical documents, and calligraphic traditions demand precise glyph control beyond what a font's default rendering provides.
For example, the character ่พป (U+8FBB, โtsujiโ, a common Japanese surname) has two accepted forms: one with one dot on the left radical (ไธ็นใใใซใใ) and one with two dots (ไบ็นใใใซใใ). Both are โcorrectโ โ but which one appears depends on the font, and there is no way to choose using the base code point alone.
Unicode's solution is Variation Sequences: a base character followed by a special variation selector character that specifies the exact glyph form.
How IVS Works: The E0100 Range
Ideographic Variation Sequences (IVS) use variation selectors from the range U+E0100 through U+E01EF (240 selectors, called VS17 through VS256). An IVS is a two-character sequence:
Base character + Variation Selector = IVS Example: ่ (U+845B) + VS17 (U+E0100) = ่๓ (specific variant) ่ (U+845B) alone = ่ (default glyph)
The variation selector is invisible โ it produces no glyph of its own. But a font that supports IVS will render a different glyph when it encounters the sequence.
| Component | Code point | Visible? |
|---|---|---|
| Base character: ่ | U+845B | Yes |
| Variation selector: VS17 | U+E0100 | No (invisible) |
| Sequence: ่๓ | U+845B U+E0100 | Yes (variant glyph) |
In JavaScript, each variation selector in this range requires a surrogate pair (2 UTF-16 code units), so an IVS takes 3โ4 code units total despite being one grapheme cluster.
SVS: The Emoji and Symbol Variation Selectors
Standardized Variation Sequences (SVS) use a different, smaller set of variation selectors: U+FE00 through U+FE0F (VS1 through VS16). These are used for:
| Selector | Common use | Example |
|---|---|---|
| VS1 (U+FE00) | CJK compatibility variants | ่ฆ + VS1 for specific form |
| VS15 (U+FE0E) | Text presentation | โบ๏ธ (text style) |
| VS16 (U+FE0F) | Emoji presentation | โบ๏ธ (emoji style) |
The most widely known SVS usage is the text/emoji toggle. Many characters have both a text presentation (monochrome, simple) and an emoji presentation (colorful). VS15 forces text style, VS16 forces emoji style:
// Same base character, different presentations: "\u2764" // โค (default, usually emoji) "\u2764\uFE0E" // โค๏ธ (text presentation, VS15) "\u2764\uFE0F" // โค๏ธ (emoji presentation, VS16) // The selectors are invisible but change rendering: "โค๏ธ".length // 2 (base + VS16, both in BMP)
Unlike IVS selectors (which are in the SMP and need surrogates), SVS selectors are in the BMP (U+FE00โFE0F) and each take just one UTF-16 code unit.
IVD Collections: Adobe-Japan1 and Moji_Joho
Which variation selectors map to which glyphs is not arbitrary โ it is recorded in the Ideographic Variation Database (IVD), maintained by the Unicode Consortium. The IVD contains named collections:
| Collection | Scope | Entries |
|---|---|---|
| Adobe-Japan1 | Japanese typography (AJ1 CID) | ~14,700 |
| Moji_Joho | Japanese government character info | ~11,000 |
| Hanyo-Denshi | Japanese administrative systems | ~11,000 |
| KRName | Korean personal name variants | ~2,200 |
Adobe-Japan1 is the most widely supported collection. It maps IVS sequences to specific CID (Character ID) numbers in the Adobe-Japan1-7 character collection, which professional Japanese fonts implement. A font that supports Adobe-Japan1 IVS can render thousands of glyph variants.
Moji_Joho (ๆๅญๆ ๅ ฑ) is maintained by Japan's Information-technology Promotion Agency (IPA) and focuses on character variants used in official government documents and the family register system (ๆธ็ฑ).
Font Support: When IVS Actually Works
IVS only works if the font supports it. A font must contain:
- The glyph variants for each supported IVS sequence
- A
cmaptable (specifically format 14, Unicode Variation Sequences) that maps base+selector pairs to glyphs
Major fonts with IVS support include:
| Font | Collection | Platform |
|---|---|---|
| IPAmj Mincho | Moji_Joho | Cross-platform (free) |
| Noto Sans CJK | Adobe-Japan1 (partial) | Cross-platform (free) |
| Kozuka Mincho | Adobe-Japan1 | Adobe products |
| Yu Mincho | Adobe-Japan1 (partial) | Windows / macOS |
| Hiragino Mincho | Adobe-Japan1 (partial) | macOS |
If a font does not support a particular IVS, it simply renders the base character's default glyph and ignores the variation selector. This is a graceful fallback โ the text remains legible, just not in the specific variant requested.
The Record Holder: ้ and Its 47 Variants
The character ้ (U+9089) holds the record for the most registered IVS sequences. It has approximately 47 variant forms in the Moji_Joho collection, reflecting the many ways this character has been written in Japanese family registers over the centuries.
The surname ๆธก้ (Watanabe) is notorious in Japan for having dozens of variant spellings. Municipal offices maintaining family registers need to faithfully reproduce the exact variant used in each family's records, which is why the Moji_Joho collection registers so many forms.
| Character | IVS variants (Moji_Joho) | Typical use |
|---|---|---|
| ้ U+9089 | ~47 | ๆธก้ surname variants |
| ้ U+908A | ~30 | ๆธก้ surname variants |
| ่พบ U+8FBA | ~10 | ๆธก่พบ surname variants |
| ่ U+845B | ~8 | Place names (่้ฃพ etc.) |
This is a case where IVS is essential: without it, government systems could not accurately record the legally distinct name variants that Japanese law requires preserving.
// The ้ character with different IVS:
"้" // Default glyph
"้\u{E0100}" // Variant 1 (VS17)
"้\u{E0101}" // Variant 2 (VS18)
// ... up to ~47 registered variants
// Each is one grapheme cluster:
const seg = new Intl.Segmenter();
[...seg.segment("้\u{E0100}")].length // 1Related articles
Han Unification: How Unicode Merged 100,000 CJK Characters
How the IRG decided which characters from Japan, China, Taiwan, and Korea are 'the same,' with a tool to check any character's source.
Why One Font Isn't Enough: CJK Variant Coverage Across Fonts
How different CJK fonts implement different IVD collections, why a single font can't show every registered variant, and how this site combines three fonts to render every IVS faithfully.
JIS Levels and Kuten Codes: Japan's Character Classification System
How Japan classifies kanji into 4 levels across JIS X 0208 and JIS X 0213, with kuten positional codes.