[Json] Json and U+08A1 and related cases (was: Re: Barry Leiba's Discuss on draft-ietf-json-i-json-05: (with DISCUSS and COMMENT))
cowan at ccil.org
cowan at ccil.org
Wed Jan 21 21:33:12 CET 2015
John C Klensin scripsit:
> But, while U+08A1 is abstract-character-identical and even
> plausible-name-identical to U+0628 U+0654, it does _not_
> decompose into the latter. Instead, NFD(U+08A1) = NFC(U+08A1) =
> U+08A1. NFC (U+0628 U+0654) is U+0628 U+0654 as one would
> expect from the stability rules; from that perspective, it is
> the failure of U+08A1 to have a (non-identity) decomposition
> that is the issue.
If U+08A1 had such a decomposition, it would violate Unicode's
no-new-NFC rule. What it violates is the (false) assumption that
base1 + combining is never confusable with a canonically
non-equivalent base2. Even outside Arabic there are already
such cases:
U+1D92 LATIN SMALL LETTER E WITH RETROFLEX HOOK is not canonically
equivalent to U+0065 LATIN SMALL LETTER E plus U+0322 COMBINING
RETROFLEX HOOK BELOW (and ditto for some other Latin letters).
U+047D CYRILLIC SMALL LETTER OMEGA WITH TITLO is not canonically
equivalent to U+0461 CYRILLIC SMALL LETTER OMEGA plus U+0483 COMBINING
CYRILLIC TITLO (and ditto for capital omega)
U+2A25 PLUS SIGN WITH DOT BELOW is not canonically
equivalent to U+002B PLUS SIGN plus U+0323 COMBINING DOT BELOW
(and ditto for many other math symbols).
--
John Cowan http://www.ccil.org/~cowan cowan at ccil.org
What is the sound of Perl? Is it not the sound of a [Ww]all that people
have stopped banging their head against? --Larry
More information about the Idna-update
mailing list