<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <b>Andrew Sullivan</b> <a

href="mailto:idna-update%40alvestrand.no?Subject=Re%3A%20Unicode%207.0.0%2C%20%28combining%29%20Hamza%20Above%2C%20and%20normalization&In-Reply-To=%3C20140806180130.GK37544%40mx1.yitter.info%3E"

      title="Unicode 7.0.0, (combining) Hamza Above, and normalization">ajs

      at anvilwalrusden.com </a><br>

    <i>Wed Aug 6 20:01:30 CEST 2014</i><br>

    <blockquote type="cite">

      <pre>The current problem we're talking about is one in which "the very same

character" can be produced by a combining sequence and as a precomposed

character, but where the normalization rules for the combining

sequence and the precomposed character don't produce the same result.

It is as if you produced o-diaeresis using U+006F and U+0308, and also

produced it using U+00F6, but when you ran the results through NFC you

didn't get a match.  Also, this is not cross-script: it's in the very

same script.  

The difference in this case, as I understand Mark's argument, is that

in the present case

   1. U+08A1 ARABIC LETTER BEH WITH HAMZA ABOVE

   2. U+0628 ARABIC LETTER BEH + U+0654 ARABIC HAMZA ABOVE

(1) and (2) are _not_ "the very same character";</pre>

    </blockquote>

    <br>

    as in the completely parallel case of <br>

    <br>

    <pre>   3. U+00F8 LATIN SMALL LETTER O WITH STROKE

   4. U+006F LATIN SMALL LETTER O + U+0338 COMBINING LONG SOLIDUS OVERLAY</pre>

    <br>

    While 0338 could in principle be used to "dummy up" the appearance

    of 00F8, it is not intended to be used that way. This was reinforced

    by using a different term (stroke vs. solidus) in the character

    name, but the latter is immaterial if you insist on looking at

    character strings purely glyphically (or from appearance).<br>

    <br>

    <blockquote type="cite">

      <pre> but

   A. U+00F6 LATIN SMALL LETTER O WITH DIAERESIS

   B. U+006F LATIN SMALL LETTER O + U+0308 COMBINING DIAERESIS

(A) and (b) _are_ "the very same character".  So NFC(1) != NFC(2) but

NFC(A) == NFC(B).

I understand this argument.  I'm a little uncomfortable with the

implications for IDNA, however.</pre>

    </blockquote>

    <br>

    The case of 3 and 4 has been in IDNA from the beginning and affects

    one of the more computer-literate communities (Western Scandinavia).<br>

    It's not, apparently been something that has led to massive issues,

    otherwise it would be a well known case.<br>

    <br>

    A./<br>

    <blockquote type="cite">

      <pre>

Best regards,

A

n</pre>

    </blockquote>

  </body>

</html>