<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<b>Andrew Sullivan</b> <a
href="mailto:idna-update%40alvestrand.no?Subject=Re%3A%20Unicode%207.0.0%2C%20%28combining%29%20Hamza%20Above%2C%20and%20normalization&In-Reply-To=%3C20140806180130.GK37544%40mx1.yitter.info%3E"
title="Unicode 7.0.0, (combining) Hamza Above, and normalization">ajs
at anvilwalrusden.com </a><br>
<i>Wed Aug 6 20:01:30 CEST 2014</i><br>
<blockquote type="cite">
<pre>The current problem we're talking about is one in which "the very same
character" can be produced by a combining sequence and as a precomposed
character, but where the normalization rules for the combining
sequence and the precomposed character don't produce the same result.
It is as if you produced o-diaeresis using U+006F and U+0308, and also
produced it using U+00F6, but when you ran the results through NFC you
didn't get a match. Also, this is not cross-script: it's in the very
same script.
The difference in this case, as I understand Mark's argument, is that
in the present case
1. U+08A1 ARABIC LETTER BEH WITH HAMZA ABOVE
2. U+0628 ARABIC LETTER BEH + U+0654 ARABIC HAMZA ABOVE
(1) and (2) are _not_ "the very same character";</pre>
</blockquote>
<br>
as in the completely parallel case of <br>
<br>
<pre> 3. U+00F8 LATIN SMALL LETTER O WITH STROKE
4. U+006F LATIN SMALL LETTER O + U+0338 COMBINING LONG SOLIDUS OVERLAY</pre>
<br>
While 0338 could in principle be used to "dummy up" the appearance
of 00F8, it is not intended to be used that way. This was reinforced
by using a different term (stroke vs. solidus) in the character
name, but the latter is immaterial if you insist on looking at
character strings purely glyphically (or from appearance).<br>
<br>
<blockquote type="cite">
<pre> but
A. U+00F6 LATIN SMALL LETTER O WITH DIAERESIS
B. U+006F LATIN SMALL LETTER O + U+0308 COMBINING DIAERESIS
(A) and (b) _are_ "the very same character". So NFC(1) != NFC(2) but
NFC(A) == NFC(B).
I understand this argument. I'm a little uncomfortable with the
implications for IDNA, however.</pre>
</blockquote>
<br>
The case of 3 and 4 has been in IDNA from the beginning and affects
one of the more computer-literate communities (Western Scandinavia).<br>
It's not, apparently been something that has led to massive issues,
otherwise it would be a well known case.<br>
<br>
A./<br>
<blockquote type="cite">
<pre>
Best regards,
A
n</pre>
</blockquote>
</body>
</html>