<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">I am responding to Vint's message,

      because, for some reason, I do not receive Andrew's messages via

      the list.<br>

      <br>

      On 8/11/2014 7:47 AM, Vint Cerf wrote:<br>

    </div>

    <blockquote

cite="mid:CAHxHggeA4-cERER2OVhRA7mbPwGf8fV6SypopgYf6rRQX_cP1Q@mail.gmail.com"

      type="cite">

      <div dir="ltr">Amen to Andrew's basic point.

        <div><br>

        </div>

        <div>v</div>

        <div><br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <br>

        <div class="gmail_quote">On Mon, Aug 11, 2014 at 10:42 AM,

          Andrew Sullivan <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:ajs@anvilwalrusden.com" target="_blank">ajs@anvilwalrusden.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div class=""> That behaviour is surprising to me given what

              I understood at<br>

            </div>

            the time we worked on and published IDNA2008.  (It is in

            fact<br>

            surprising to me even now when I read the text of the

            standard, but I<br>

            understand the argument that in fact the new character is

            somehow<br>

            unrelated enough to the former combining sequence that the

            combining<br>

            sequence never really worked, but that doesn't matter.  I

            would<br>

            probably find that argument more compelling if I understood

            why this<br>

            case is different from ö in Swedish vs. ö in German, but

            never mind<br>

            that, either.)<br>

          </blockquote>

        </div>

      </div>

    </blockquote>

    <br>

    First, the very same case has been in place for ø in Danish (and

    Norwegian)<br>

    which will look like the sequence o + combining /, but is not deemed

    <br>

    identical to it.<br>

    <br>

    The combining / exists for a well-defined purpose, viz. mathematical<br>

    negation.<br>

    <br>

    However, for letters, marks that are overlays (stroke, bar, etc.)

    are<br>

    extremely problematic, because while the concept can be articulated<br>

    there is a wide variability of how the overlay could be applied.<br>

    Horizontal strokes, in particular, can be applied to any part of a <br>

    glyph (stem, bowl, part of a bowl, etc.) making a decomposition<br>

    not tractable. (For diagonal strokes you have similar issues with<br>

    angle and length.)<br>

    <br>

    As a result, Unicode has the principle of encoding all overlays<br>

    as precomposed forms (except for mathematics where only<br>

    those forms are precomposed where the negation is applied<br>

    irregularly). The exception for mathematics makes sense, because<br>

    there's a reasonably consistent semantics (negation) associated<br>

    with the combination, and the use is fully productive (can be<br>

    applied to essentially any symbol or operator).<br>

    <br>

    The case under consideration is rather similar. The combining<br>

    hamza exists for a particular use case (Koran), but is otherwise<br>

    not part of the orthography. As I understand, the use of the <br>

    combined form for a non-Arabic language is unrelated to <br>

    applying  a "hamza" even though it uses the same squiggle.<br>

    <br>

    It's really important to step back and realize that composition<br>

    in Unicode is not intended to work like a "glyph composition<br>

    toolkit". It is intended to handle certain systematic (productive)<br>

    cases, where a mark (for example breve or macron) can be<br>

    applied to many characters to indicate short/long pronunciation.<br>

    <br>

    In technical use, these combinations are unrestricted, which is<br>

    reflected in Unicode by the use of combining marks.<br>

    <br>

    What this has to do with two letters (whether 'a' and 'a' or <br>

    ö and ö) being used in two different languages is a bit unclear<br>

    to me, so I don't understand Andrew's question.<br>

    <br>

    <blockquote

cite="mid:CAHxHggeA4-cERER2OVhRA7mbPwGf8fV6SypopgYf6rRQX_cP1Q@mail.gmail.com"

      type="cite">

      <div class="gmail_extra">

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <br>

            What is important at least for me now is to understand the

            extent to<br>

            which this sort of thing happens, what our expectation ought

            to be in<br>

            the future about its recurrence, and what implications that

            has for<br>

            how we build network protocols atop Unicode.<br>

          </blockquote>

        </div>

      </div>

    </blockquote>

    <br>

    This "thing" happens regularly (but not really frequently) and<br>

    usually not the in the context of two languages competing with<br>

    each other, but more often in the context of some technical<br>

    or limited use needing a combining approach (because in that<br>

    context, there really is an underlying combination or "apply<br>

    this mark to that character") and an orthographic use of a <br>

    fixed symbol which is deemed not analyzable in that context.<br>

    <br>

    For obvious reasons, this "thing" tends to happen for minority<br>

    languages, not to say "obscure" ones, if only for the simple<br>

    reason that the common, well-known, and prominent ones<br>

    are all known and accounted for - but not without having<br>

    this "thing" part of the existing Unicode. (See example above).<br>

    <br>

    I keep coming back to the question of why, with the <br>

    in your face Scandinavian example of long standing, <br>

    this is suddenly such an issue for a rather obscure language.<br>

    <br>

    Or, to put in terms of expectations: I would not expect<br>

    this particular code point to be handled in a totally ad-hoc<br>

    fashion, if more prominent examples went unchallenged,<br>

    and, presumably, are being dealt with more systematically<br>

    by other means.<br>

    <br>

    A./<br>

    <blockquote

cite="mid:CAHxHggeA4-cERER2OVhRA7mbPwGf8fV6SypopgYf6rRQX_cP1Q@mail.gmail.com"

      type="cite">

      <div class="gmail_extra">

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div class="HOEnZb">

              <div class="h5"><br>

                Best regards,<br>

                <br>

                A<br>

                <br>

                --<br>

                Andrew Sullivan<br>

                <a moz-do-not-send="true"

                  href="mailto:ajs@anvilwalrusden.com">ajs@anvilwalrusden.com</a><br>

                _______________________________________________<br>

                Idna-update mailing list<br>

                <a moz-do-not-send="true"

                  href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>

                <a moz-do-not-send="true"

                  href="http://www.alvestrand.no/mailman/listinfo/idna-update"

                  target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Idna-update mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a>

<a class="moz-txt-link-freetext" href="http://www.alvestrand.no/mailman/listinfo/idna-update">http://www.alvestrand.no/mailman/listinfo/idna-update</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>