The Wave Dash Problem: 〜 vs ~ and 7 Other Mapping Conflicts
Complete reference on the 7 JIS-Unicode mapping discrepancies with an interactive toggle to see both variants.
Why the Same Byte Maps to Two Unicode Characters
In JIS X 0208, row 1 column 33 (区点 1-33) is the “wave dash” — a wavy horizontal line used in Japanese to indicate ranges (e.g., 3時〜5時). When JIS was mapped to Unicode, two different organizations made two different choices:
| Mapper | JIS 1-33 → | Character | Name |
|---|---|---|---|
| Unicode.org (JIS X 0208:1997 Annex) | U+301C | 〜 | WAVE DASH |
| Microsoft (CP932 / Windows-31J) | U+FF5E | ~ | FULLWIDTH TILDE |
The characters look similar but are semantically different. U+301C WAVE DASH is the standard Unicode mapping of the JIS wave dash. U+FF5E FULLWIDTH TILDE is a fullwidth form of the ASCII tilde (~), not originally intended to represent the JIS wave dash at all.
Microsoft chose U+FF5E because early Unicode fonts rendered U+301C with an inverted curve on Windows, making it look wrong to Japanese users. Rather than fix the glyph, Microsoft mapped to a different code point entirely.
The Complete Table: All 7 Discrepancies
The wave dash is the most famous case, but there are actually 7 JIS-to-Unicode mapping discrepancies between the Unicode.org/JIS standard mapping and Microsoft's CP932 mapping:
| JIS Kuten | JIS Name | Unicode.org | Microsoft CP932 | Description |
|---|---|---|---|---|
| 1-17 | EM DASH | U+2014 — | U+2015 ― | Em dash vs Horizontal bar |
| 1-29 | MINUS SIGN | U+2212 − | U+FF0D - | Minus sign vs Fullwidth hyphen-minus |
| 1-33 | WAVE DASH | U+301C 〜 | U+FF5E ~ | Wave dash vs Fullwidth tilde |
| 1-36 | DOUBLE VERTICAL LINE | U+2016 ‖ | U+2225 ∥ | Double vertical line vs Parallel to |
| 1-61 | MINUS SIGN (alt) | U+00A2 ¢ | U+FFE0 ¢ | Cent sign vs Fullwidth cent sign |
| 1-81 | POUND SIGN | U+00A3 £ | U+FFE1 £ | Pound sign vs Fullwidth pound sign |
| 1-82 | NOT SIGN | U+00AC ¬ | U+FFE2 ¬ | Not sign vs Fullwidth not sign |
In every case, Microsoft chose a fullwidth or visually similar variant rather than the character that Unicode.org considers the correct semantic mapping.
Historical Context: How This Happened
The root cause traces back to the early 1990s:
- 1993: Microsoft shipped Windows 3.1J with CP932, creating mappings before Unicode glyph rendering was mature.
- 1997: JIS X 0208:1997 included an official Unicode mapping in its Annex that differed from Microsoft's.
- 2000s: By the time the discrepancy was widely recognized, billions of documents existed with both mappings.
Neither mapping is “wrong” in absolute terms — Microsoft prioritized visual appearance on their platform, while the JIS standard prioritized semantic correctness.
Practical Impact: Where It Breaks
The wave dash problem surfaces in several real scenarios:
- Database migration: Converting data between Oracle (which often used the JIS/Unicode.org mapping) and SQL Server (which used the Microsoft mapping) could silently swap characters.
- Email: JIS-encoded email decoded with different mapping tables would show wrong characters.
- Web forms: A user typing 〜 on macOS (which uses U+301C) and another on Windows (which historically used U+FF5E) would produce different data for the “same” character.
- Search: Searching for 〜 would not match ~, even though the user considers them identical.
// These look similar but are different code points:
"〜".codePointAt(0).toString(16) // "301c" (WAVE DASH)
"~".codePointAt(0).toString(16) // "ff5e" (FULLWIDTH TILDE)
// Direct comparison fails:
"〜" === "~" // false
// Even NFKC normalization doesn't help here:
"〜".normalize("NFKC") === "~".normalize("NFKC") // falseWhich Mapping Should You Use?
The answer depends on your context:
| Context | Recommended | Reason |
|---|---|---|
| New data / Unicode-native | Unicode.org (U+301C) | Semantically correct per JIS standard |
| Windows interop / legacy | Microsoft (U+FF5E) | Matches existing CP932 data |
| WHATWG Encoding Standard | Microsoft (U+FF5E) | Browsers use CP932-compatible mapping |
| Apple platforms | Unicode.org (U+301C) | macOS/iOS use the JIS standard mapping |
The WHATWG Encoding Standard (used by all web browsers) follows the Microsoft mapping for Shift_JIS decoding. This means that when a browser decodes a Shift_JIS page, JIS 1-33 becomes U+FF5E, not U+301C. This is a pragmatic choice: most Shift_JIS content was created on Windows.
This tool lets you toggle between the two mapping tables so you can see exactly how each byte sequence is interpreted.
The Broader Pattern: Not Just Japanese
The wave dash problem is the most notorious example of a mapping discrepancy, but similar issues exist in other encodings:
- EUC-KR / CP949: Korean encoding has its own set of mapping disagreements between the KS standard and Microsoft's implementation.
- Big5 / CP950: Traditional Chinese encoding similarly diverges between the official standard and Microsoft's extensions.
- GB2312 / GBK / CP936: Simplified Chinese encodings have grown through multiple incompatible extensions.
The lesson is universal: whenever a character encoding was mapped to Unicode by multiple parties independently, discrepancies were nearly inevitable. Unicode itself is not at fault — the problem is the many-to-one nature of legacy-to-Unicode conversion.
Related articles
Shift_JIS vs CP932: The Encoding Everyone Confuses
The precise technical differences between Shift_JIS and CP932 (Windows-31J), with byte-level evidence.
Legacy Encoding Survival Guide: From ASCII to GB18030
A practical overview of 20+ character encodings across languages, how they relate, and how to identify them.