〜

The Wave Dash Problem: 〜 vs ～ and 7 Other Mapping Conflicts波ダッシュ問題の全貌: 〜 vs ～と7つのマッピング不一致

Complete reference on the 7 JIS-Unicode mapping discrepancies with an interactive toggle to see both variants.7つの JIS-Unicode マッピング不一致の完全リファレンス。インタラクティブな切り替えで両方を確認。

Why the Same Byte Maps to Two Unicode Characters

In JIS X 0208, row 1 column 33 (区点 1-33) is the “wave dash” — a wavy horizontal line used in Japanese to indicate ranges (e.g., 3時〜5時). When JIS was mapped to Unicode, two different organizations made two different choices:

Mapper	JIS 1-33 →	Character	Name
Unicode.org (JIS X 0208:1997 Annex)	U+301C	〜	WAVE DASH
Microsoft (CP932 / Windows-31J)	U+FF5E	～	FULLWIDTH TILDE

The characters look similar but are semantically different. U+301C WAVE DASH is the standard Unicode mapping of the JIS wave dash. U+FF5E FULLWIDTH TILDE is a fullwidth form of the ASCII tilde (~), not originally intended to represent the JIS wave dash at all.

Microsoft chose U+FF5E because early Unicode fonts rendered U+301C with an inverted curve on Windows, making it look wrong to Japanese users. Rather than fix the glyph, Microsoft mapped to a different code point entirely.

Compare both wave dashes

The Complete Table: All 7 Discrepancies

The wave dash is the most famous case, but there are actually 7 JIS-to-Unicode mapping discrepancies between the Unicode.org/JIS standard mapping and Microsoft's CP932 mapping:

JIS Kuten	JIS Name	Unicode.org	Microsoft CP932	Description
1-17	EM DASH	U+2014 —	U+2015 ―	Em dash vs Horizontal bar
1-29	MINUS SIGN	U+2212 −	U+FF0D －	Minus sign vs Fullwidth hyphen-minus
1-33	WAVE DASH	U+301C 〜	U+FF5E ～	Wave dash vs Fullwidth tilde
1-36	DOUBLE VERTICAL LINE	U+2016 ‖	U+2225 ∥	Double vertical line vs Parallel to
1-61	MINUS SIGN (alt)	U+00A2 ¢	U+FFE0 ￠	Cent sign vs Fullwidth cent sign
1-81	POUND SIGN	U+00A3 £	U+FFE1 ￡	Pound sign vs Fullwidth pound sign
1-82	NOT SIGN	U+00AC ¬	U+FFE2 ￢	Not sign vs Fullwidth not sign

In every case, Microsoft chose a fullwidth or visually similar variant rather than the character that Unicode.org considers the correct semantic mapping.

Inspect fullwidth variants

Historical Context: How This Happened

The root cause traces back to the early 1990s:

1993: Microsoft shipped Windows 3.1J with CP932, creating mappings before Unicode glyph rendering was mature.
1997: JIS X 0208:1997 included an official Unicode mapping in its Annex that differed from Microsoft's.
2000s: By the time the discrepancy was widely recognized, billions of documents existed with both mappings.

Neither mapping is “wrong” in absolute terms — Microsoft prioritized visual appearance on their platform, while the JIS standard prioritized semantic correctness.

Practical Impact: Where It Breaks

The wave dash problem surfaces in several real scenarios:

Database migration: Converting data between Oracle (which often used the JIS/Unicode.org mapping) and SQL Server (which used the Microsoft mapping) could silently swap characters.
Email: JIS-encoded email decoded with different mapping tables would show wrong characters.
Web forms: A user typing 〜 on macOS (which uses U+301C) and another on Windows (which historically used U+FF5E) would produce different data for the “same” character.
Search: Searching for 〜 would not match ～, even though the user considers them identical.

// These look similar but are different code points:
"〜".codePointAt(0).toString(16)  // "301c" (WAVE DASH)
"～".codePointAt(0).toString(16)  // "ff5e" (FULLWIDTH TILDE)

// Direct comparison fails:
"〜" === "～"  // false

// Even NFKC normalization doesn't help here:
"〜".normalize("NFKC") === "～".normalize("NFKC")  // false

Which Mapping Should You Use?

The answer depends on your context:

Context	Recommended	Reason
New data / Unicode-native	Unicode.org (U+301C)	Semantically correct per JIS standard
Windows interop / legacy	Microsoft (U+FF5E)	Matches existing CP932 data
WHATWG Encoding Standard	Microsoft (U+FF5E)	Browsers use CP932-compatible mapping
Apple platforms	Unicode.org (U+301C)	macOS/iOS use the JIS standard mapping

The WHATWG Encoding Standard (used by all web browsers) follows the Microsoft mapping for Shift_JIS decoding. This means that when a browser decodes a Shift_JIS page, JIS 1-33 becomes U+FF5E, not U+301C. This is a pragmatic choice: most Shift_JIS content was created on Windows.

This tool lets you toggle between the two mapping tables so you can see exactly how each byte sequence is interpreted.

The Broader Pattern: Not Just Japanese

The wave dash problem is the most notorious example of a mapping discrepancy, but similar issues exist in other encodings:

EUC-KR / CP949: Korean encoding has its own set of mapping disagreements between the KS standard and Microsoft's implementation.
Big5 / CP950: Traditional Chinese encoding similarly diverges between the official standard and Microsoft's extensions.
GB2312 / GBK / CP936: Simplified Chinese encodings have grown through multiple incompatible extensions.

The lesson is universal: whenever a character encoding was mapped to Unicode by multiple parties independently, discrepancies were nearly inevitable. Unicode itself is not at fault — the problem is the many-to-one nature of legacy-to-Unicode conversion.

View with Unicode.org mapping

なぜ同じバイトが2つのUnicode文字にマッピングされるのか

JIS X 0208 の1区33点は「波ダッシュ」— 日本語で範囲を示す波線記号（例: 3時〜5時）です。JIS を Unicode にマッピングする際、2つの組織が異なる選択をしました:

マッパー	JIS 1-33 →	文字	名前
Unicode.org（JIS X 0208:1997 附属書）	U+301C	〜	WAVE DASH
Microsoft（CP932 / Windows-31J）	U+FF5E	～	FULLWIDTH TILDE

見た目は似ていますが意味が異なります。U+301C WAVE DASH は JIS 波ダッシュの標準 Unicode マッピングです。U+FF5E FULLWIDTH TILDE は ASCII チルダ(~)の全角形であり、本来 JIS 波ダッシュを表すものではありません。

Microsoft が U+FF5E を選んだ理由は、初期の Windows 上で U+301C のグリフが上下反転した波形で表示されたためです。グリフを修正する代わりに、別のコードポイントにマッピングするという判断がなされました。

両方の波ダッシュを比較する

完全な一覧表: 7つの不一致

波ダッシュが最も有名ですが、Unicode.org/JIS 標準マッピングと Microsoft CP932 マッピングの間には実際に7つの不一致が存在します:

JIS 区点	JIS 名称	Unicode.org	Microsoft CP932	説明
1-17	ダッシュ	U+2014 —	U+2015 ―	Em dash vs 水平バー
1-29	マイナス記号	U+2212 −	U+FF0D －	マイナス vs 全角ハイフンマイナス
1-33	波ダッシュ	U+301C 〜	U+FF5E ～	波ダッシュ vs 全角チルダ
1-36	双柱	U+2016 ‖	U+2225 ∥	双柱 vs 平行
1-61	セント記号	U+00A2 ¢	U+FFE0 ￠	セント vs 全角セント
1-81	ポンド記号	U+00A3 £	U+FFE1 ￡	ポンド vs 全角ポンド
1-82	否定記号	U+00AC ¬	U+FFE2 ￢	否定 vs 全角否定

すべてのケースで、Microsoft は Unicode.org が正しい意味的マッピングとする文字ではなく、全角または視覚的に類似した変種を選択しています。

全角変種を検査する

歴史的経緯: なぜこうなったのか

根本原因は1990年代初頭に遡ります:

1993年: Microsoft が Windows 3.1J で CP932 を出荷。Unicode のグリフレンダリングが成熟する前にマッピングを作成。
1997年: JIS X 0208:1997 の附属書に公式 Unicode マッピングが収録されたが、Microsoft のものと異なっていた。
2000年代: 不一致が広く認識された頃には、両方のマッピングで作成された文書が膨大に存在。

どちらのマッピングも絶対的に「間違い」ではありません。Microsoft は自社プラットフォームでの見た目を優先し、JIS 標準は意味的な正確性を優先しました。

実用上の影響: どこで問題が起きるか

波ダッシュ問題は以下のような実際のシナリオで顕在化します:

データベース移行: Oracle（JIS/Unicode.org マッピングを使用することが多い）と SQL Server（Microsoft マッピングを使用）間のデータ変換で、文字がサイレントに入れ替わる可能性。
メール: JIS エンコードされたメールを異なるマッピングテーブルでデコードすると誤った文字が表示される。
Web フォーム: macOS（U+301C を使用）で入力した〜と、Windows（歴史的に U+FF5E を使用）で入力した～は、「同じ」文字なのに異なるデータになる。
検索: 〜で検索しても～はヒットしない。ユーザーにとっては同一の文字なのに。

// 見た目は似ているが異なるコードポイント:
"〜".codePointAt(0).toString(16)  // "301c" (WAVE DASH)
"～".codePointAt(0).toString(16)  // "ff5e" (FULLWIDTH TILDE)

// 直接比較は失敗:
"〜" === "～"  // false

// NFKC 正規化でも解決しない:
"〜".normalize("NFKC") === "～".normalize("NFKC")  // false

どちらのマッピングを使うべきか

文脈によって推奨が異なります:

文脈	推奨	理由
新規データ / Unicode ネイティブ	Unicode.org (U+301C)	JIS 標準に基づく正しい意味的マッピング
Windows 連携 / レガシー	Microsoft (U+FF5E)	既存の CP932 データと一致
WHATWG Encoding Standard	Microsoft (U+FF5E)	ブラウザは CP932 互換マッピングを使用
Apple プラットフォーム	Unicode.org (U+301C)	macOS/iOS は JIS 標準マッピングを使用

WHATWG Encoding Standard（全ウェブブラウザが使用）は Shift_JIS デコード時に Microsoft マッピングに従います。つまりブラウザが Shift_JIS ページをデコードすると、JIS 1-33 は U+301C ではなく U+FF5E になります。これは実用的な選択です: Shift_JIS コンテンツの大半は Windows で作成されたためです。

このツールでは2つのマッピングテーブルを切り替えて、各バイト列がどう解釈されるかを確認できます。

より広いパターン: 日本語だけの問題ではない

波ダッシュ問題はマッピング不一致の最も有名な例ですが、他のエンコーディングにも類似の問題が存在します:

EUC-KR / CP949: 韓国語エンコーディングにも KS 標準と Microsoft 実装の間でマッピングの不一致がある。
Big5 / CP950: 繁体字中国語エンコーディングも公式標準と Microsoft 拡張の間で同様に乖離。
GB2312 / GBK / CP936: 簡体字中国語エンコーディングは複数の非互換な拡張を経て成長。

教訓は普遍的です: 文字エンコーディングが複数の当事者によって独立に Unicode にマッピングされた場合、不一致はほぼ不可避でした。Unicode 自体に非はなく、問題はレガシーから Unicode への変換が多対一であることに起因します。

Unicode.org マッピングで表示する