[Json] Json and U+08A1 and related cases (was: Re: Barry Leiba's Discuss on draft-ietf-json-i-json-05: (with DISCUSS and COMMENT))

cowan at ccil.org cowan at ccil.org
Wed Jan 21 21:33:12 CET 2015


John C Klensin scripsit:

> But, while U+08A1 is abstract-character-identical and even
> plausible-name-identical to U+0628 U+0654, it does _not_
> decompose into the latter.  Instead, NFD(U+08A1) = NFC(U+08A1) =
> U+08A1.  NFC (U+0628 U+0654) is U+0628 U+0654 as one would
> expect from the stability rules; from that perspective, it is
> the failure of U+08A1 to have a (non-identity) decomposition
> that is the issue.

If U+08A1 had such a decomposition, it would violate Unicode's
no-new-NFC rule.  What it violates is the (false) assumption that
base1 + combining is never confusable with a canonically
non-equivalent base2.  Even outside Arabic there are already
such cases:

U+1D92 LATIN SMALL LETTER E WITH RETROFLEX HOOK is not canonically
equivalent to U+0065 LATIN SMALL LETTER E plus U+0322 COMBINING
RETROFLEX HOOK BELOW (and ditto for some other Latin letters).

U+047D CYRILLIC SMALL LETTER OMEGA WITH TITLO is not canonically
equivalent to U+0461 CYRILLIC SMALL LETTER OMEGA plus U+0483 COMBINING
CYRILLIC TITLO (and ditto for capital omega)

U+2A25 PLUS SIGN WITH DOT BELOW is not canonically
equivalent to U+002B PLUS SIGN plus U+0323 COMBINING DOT BELOW
(and ditto for many other math symbols).

-- 
John Cowan          http://www.ccil.org/~cowan        cowan at ccil.org
What is the sound of Perl?  Is it not the sound of a [Ww]all that people
have stopped banging their head against?  --Larry




More information about the Idna-update mailing list