<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">I am responding to Vint's message,
because, for some reason, I do not receive Andrew's messages via
the list.<br>
<br>
On 8/11/2014 7:47 AM, Vint Cerf wrote:<br>
</div>
<blockquote
cite="mid:CAHxHggeA4-cERER2OVhRA7mbPwGf8fV6SypopgYf6rRQX_cP1Q@mail.gmail.com"
type="cite">
<div dir="ltr">Amen to Andrew's basic point.
<div><br>
</div>
<div>v</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Aug 11, 2014 at 10:42 AM,
Andrew Sullivan <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:ajs@anvilwalrusden.com" target="_blank">ajs@anvilwalrusden.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class=""> That behaviour is surprising to me given what
I understood at<br>
</div>
the time we worked on and published IDNA2008. (It is in
fact<br>
surprising to me even now when I read the text of the
standard, but I<br>
understand the argument that in fact the new character is
somehow<br>
unrelated enough to the former combining sequence that the
combining<br>
sequence never really worked, but that doesn't matter. I
would<br>
probably find that argument more compelling if I understood
why this<br>
case is different from ö in Swedish vs. ö in German, but
never mind<br>
that, either.)<br>
</blockquote>
</div>
</div>
</blockquote>
<br>
First, the very same case has been in place for ø in Danish (and
Norwegian)<br>
which will look like the sequence o + combining /, but is not deemed
<br>
identical to it.<br>
<br>
The combining / exists for a well-defined purpose, viz. mathematical<br>
negation.<br>
<br>
However, for letters, marks that are overlays (stroke, bar, etc.)
are<br>
extremely problematic, because while the concept can be articulated<br>
there is a wide variability of how the overlay could be applied.<br>
Horizontal strokes, in particular, can be applied to any part of a <br>
glyph (stem, bowl, part of a bowl, etc.) making a decomposition<br>
not tractable. (For diagonal strokes you have similar issues with<br>
angle and length.)<br>
<br>
As a result, Unicode has the principle of encoding all overlays<br>
as precomposed forms (except for mathematics where only<br>
those forms are precomposed where the negation is applied<br>
irregularly). The exception for mathematics makes sense, because<br>
there's a reasonably consistent semantics (negation) associated<br>
with the combination, and the use is fully productive (can be<br>
applied to essentially any symbol or operator).<br>
<br>
The case under consideration is rather similar. The combining<br>
hamza exists for a particular use case (Koran), but is otherwise<br>
not part of the orthography. As I understand, the use of the <br>
combined form for a non-Arabic language is unrelated to <br>
applying a "hamza" even though it uses the same squiggle.<br>
<br>
It's really important to step back and realize that composition<br>
in Unicode is not intended to work like a "glyph composition<br>
toolkit". It is intended to handle certain systematic (productive)<br>
cases, where a mark (for example breve or macron) can be<br>
applied to many characters to indicate short/long pronunciation.<br>
<br>
In technical use, these combinations are unrestricted, which is<br>
reflected in Unicode by the use of combining marks.<br>
<br>
What this has to do with two letters (whether 'a' and 'a' or <br>
ö and ö) being used in two different languages is a bit unclear<br>
to me, so I don't understand Andrew's question.<br>
<br>
<blockquote
cite="mid:CAHxHggeA4-cERER2OVhRA7mbPwGf8fV6SypopgYf6rRQX_cP1Q@mail.gmail.com"
type="cite">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
What is important at least for me now is to understand the
extent to<br>
which this sort of thing happens, what our expectation ought
to be in<br>
the future about its recurrence, and what implications that
has for<br>
how we build network protocols atop Unicode.<br>
</blockquote>
</div>
</div>
</blockquote>
<br>
This "thing" happens regularly (but not really frequently) and<br>
usually not the in the context of two languages competing with<br>
each other, but more often in the context of some technical<br>
or limited use needing a combining approach (because in that<br>
context, there really is an underlying combination or "apply<br>
this mark to that character") and an orthographic use of a <br>
fixed symbol which is deemed not analyzable in that context.<br>
<br>
For obvious reasons, this "thing" tends to happen for minority<br>
languages, not to say "obscure" ones, if only for the simple<br>
reason that the common, well-known, and prominent ones<br>
are all known and accounted for - but not without having<br>
this "thing" part of the existing Unicode. (See example above).<br>
<br>
I keep coming back to the question of why, with the <br>
in your face Scandinavian example of long standing, <br>
this is suddenly such an issue for a rather obscure language.<br>
<br>
Or, to put in terms of expectations: I would not expect<br>
this particular code point to be handled in a totally ad-hoc<br>
fashion, if more prominent examples went unchallenged,<br>
and, presumably, are being dealt with more systematically<br>
by other means.<br>
<br>
A./<br>
<blockquote
cite="mid:CAHxHggeA4-cERER2OVhRA7mbPwGf8fV6SypopgYf6rRQX_cP1Q@mail.gmail.com"
type="cite">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb">
<div class="h5"><br>
Best regards,<br>
<br>
A<br>
<br>
--<br>
Andrew Sullivan<br>
<a moz-do-not-send="true"
href="mailto:ajs@anvilwalrusden.com">ajs@anvilwalrusden.com</a><br>
_______________________________________________<br>
Idna-update mailing list<br>
<a moz-do-not-send="true"
href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>
<a moz-do-not-send="true"
href="http://www.alvestrand.no/mailman/listinfo/idna-update"
target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Idna-update mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a>
<a class="moz-txt-link-freetext" href="http://www.alvestrand.no/mailman/listinfo/idna-update">http://www.alvestrand.no/mailman/listinfo/idna-update</a>
</pre>
</blockquote>
<br>
</body>
</html>