Unicode 7.0.0, (combining) Hamza Above, and normalization
asmusf at ix.netcom.com
Sun Aug 10 22:55:41 CEST 2014
*Andrew Sullivan* ajs at anvilwalrusden.com
/Wed Aug 6 20:01:30 CEST 2014/
> The current problem we're talking about is one in which "the very same
> character" can be produced by a combining sequence and as a precomposed
> character, but where the normalization rules for the combining
> sequence and the precomposed character don't produce the same result.
> It is as if you produced o-diaeresis using U+006F and U+0308, and also
> produced it using U+00F6, but when you ran the results through NFC you
> didn't get a match. Also, this is not cross-script: it's in the very
> same script.
> The difference in this case, as I understand Mark's argument, is that
> in the present case
> 1. U+08A1 ARABIC LETTER BEH WITH HAMZA ABOVE
> 2. U+0628 ARABIC LETTER BEH + U+0654 ARABIC HAMZA ABOVE
> (1) and (2) are _not_ "the very same character";
as in the completely parallel case of
3. U+00F8 LATIN SMALL LETTER O WITH STROKE
4. U+006F LATIN SMALL LETTER O + U+0338 COMBINING LONG SOLIDUS OVERLAY
While 0338 could in principle be used to "dummy up" the appearance of
00F8, it is not intended to be used that way. This was reinforced by
using a different term (stroke vs. solidus) in the character name, but
the latter is immaterial if you insist on looking at character strings
purely glyphically (or from appearance).
> A. U+00F6 LATIN SMALL LETTER O WITH DIAERESIS
> B. U+006F LATIN SMALL LETTER O + U+0308 COMBINING DIAERESIS
> (A) and (b) _are_ "the very same character". So NFC(1) != NFC(2) but
> NFC(A) == NFC(B).
> I understand this argument. I'm a little uncomfortable with the
> implications for IDNA, however.
The case of 3 and 4 has been in IDNA from the beginning and affects one
of the more computer-literate communities (Western Scandinavia).
It's not, apparently been something that has led to massive issues,
otherwise it would be a well known case.
> Best regards,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update