<div dir="ltr">&gt;&nbsp;<span class="Apple-style-span" style="border-collapse: collapse; ">For Korean, there is no<br>equivalent because NFC doesn&#39;t produce the relevant precomposed<br>forms.</span><div><span class="Apple-style-span" style="border-collapse: collapse; ">&gt; And, because it doesn&#39;t, our problem is not one of<br>

confusing similarity (a registry problem) but one of having<br>comparisons work correctly (a much deeper issue which we have<br>generally dealt with in the protocol, in the analogous case by<br>the requirement for NFC.</span><div>

<span class="Apple-style-span" style="border-collapse: collapse;"><br></span></div><div><span class="Apple-style-span" style="border-collapse: collapse;">John, your first premise, and thus your whole argument is incorrect. The combining Jamo *do* form composed characters under NFC. Here is an example:</span></div>

<div><br></div><div><span class="Apple-style-span" style="font-family: Times; font-size: 16px; "><code><a target="c" href="http://unicode.org/cldr/utility/character.jsp?a=1100">U+1100</a></code>&nbsp;(&nbsp;ᄀ&nbsp;) HANGUL CHOSEONG KIYEOK<br>

<code><a target="c" href="http://unicode.org/cldr/utility/character.jsp?a=1161">U+1161</a></code>&nbsp;(&nbsp;ᅡ&nbsp;) HANGUL JUNGSEONG A<br><code><a target="c" href="http://unicode.org/cldr/utility/character.jsp?a=11A8">U+11A8</a></code>&nbsp;(&nbsp;ᆨ&nbsp;) HANGUL JONGSEONG KIYEOK</span><br>

</div><div><span class="Apple-style-span" style="border-collapse: collapse;">=&gt;</span></div><div><span class="Apple-style-span" style="border-collapse: collapse;"><span class="Apple-style-span" style="border-collapse: separate; font-family: Times; font-size: 16px; "><code><a target="c" href="http://unicode.org/cldr/utility/character.jsp?a=AC01">U+AC01</a></code>&nbsp;(&nbsp;각&nbsp;) HANGUL SYLLABLE GAG</span><br>

</span></div><br>That is,&nbsp;each&nbsp;of&nbsp;the&nbsp;Hangul&nbsp;precomposed&nbsp;syllables&nbsp;decomposes&nbsp;into&nbsp;one&nbsp;or&nbsp;two&nbsp;combining&nbsp;jamo&nbsp;under&nbsp;NFD,&nbsp;and&nbsp;under&nbsp;NFC&nbsp;that&nbsp;sequence&nbsp;of&nbsp;combining&nbsp;jamo&nbsp;composes&nbsp;back&nbsp;into&nbsp;that&nbsp;syllable.&nbsp;The&nbsp;comparisons&nbsp;*do*&nbsp;work&nbsp;correctly,&nbsp;since IDNA labels&nbsp;have&nbsp;to&nbsp;be&nbsp;in&nbsp;NFC.</div>

<div><br>For&nbsp;non-modern&nbsp;use&nbsp;characters,&nbsp;the&nbsp;NFC&nbsp;form&nbsp;may&nbsp;not&nbsp;combine&nbsp;all&nbsp;of&nbsp;the&nbsp;characters, simply because there may not be a corresponding precomposed form to combine them into.&nbsp;That&nbsp;is&nbsp;not&nbsp;a&nbsp;problem.&nbsp;It&nbsp;is&nbsp;similar&nbsp;to&nbsp;cases&nbsp;with&nbsp;accents;&nbsp;the&nbsp;NFC&nbsp;form&nbsp;composes&nbsp;as&nbsp;much&nbsp;as&nbsp;it&nbsp;can,&nbsp;but&nbsp;where&nbsp;it&nbsp;can&#39;t&nbsp;compose&nbsp;it&nbsp;leaves&nbsp;the&nbsp;code&nbsp;points&nbsp;separate.</div>

<div><br></div><div>The key point is that the&nbsp;result&nbsp;is&nbsp;still&nbsp;unique&nbsp;and&nbsp;does&nbsp;not&nbsp;cause&nbsp;a&nbsp;problem&nbsp;for&nbsp;comparison.</div><div><br><div>Mark<br>

<br><br><div class="gmail_quote">On Tue, Oct 14, 2008 at 10:17 PM, John C Klensin <span dir="ltr">&lt;<a href="mailto:klensin@jck.com">klensin@jck.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

<br>

--On Tuesday, 14 October, 2008 13:22 -0400 Andrew Sullivan<br>

<div class="Ih2E3d">&lt;<a href="mailto:ajs@commandprompt.com">ajs@commandprompt.com</a>&gt; wrote:<br>

<br>

&gt; On Sun, Oct 12, 2008 at 05:25:27AM -0400, Vint Cerf wrote:<br>

&gt;&gt; Consensus Call Tranche 8 (character adjustments)<br>

&gt;&gt;<br>

&gt;&gt; Place your reply here: [NO]<br>

&gt;&gt;<br>

&gt;&gt; COMMENTS:<br>

&gt;<br>

&gt;&gt; (8.a) Make Eszett Protocol-Valid per list discussion.<br>

&gt;&gt;<br>

&gt;&gt; (8.b) Make Greek final sigma Protocol-Valid per list<br>

&gt;&gt; discussion.<br>

&gt;<br>

&gt; Since the call is all-or-nothing, I have to respond &quot;no&quot;. &nbsp;On<br>

&gt; these two, I have no opinion; I don&#39;t feel sufficiently<br>

&gt; qualified to say whether these individual characters should be<br>

&gt; altered. &nbsp;My understanding is that, because they are<br>

&gt; consistent with the tables approach that we are taking, the<br>

&gt; only reason to exclude them would be historical.<br>

<br>

</div>For whatever my opinion is worth, exactly.<br>

<div class="Ih2E3d"><br>

&gt; &nbsp;Since the<br>

&gt; unhappiness with some of those historical decisions is part of<br>

&gt; the justification for the current work, it seems to me that<br>

&gt; these ought to be allowed (although I wonder whether 8.b ought<br>

&gt; to have a context rule).<br>

<br>

</div>Could you explain why you would require a context rule for Final<br>

Sigma without requiring one for Eszett? &nbsp;Certainly it would be<br>

easier to specify a rule for the former (&quot;Script=Greek&quot;) while<br>

the latter would presumably either require either &quot;Script=Latin&quot;<br>

(which wouldn&#39;t do much good) or an enumerated list of<br>

characters. &nbsp;One can&#39;t require that the character actually<br>

appear in the last position in a label without preventing people<br>

from constructing labels by cramming words together... any<br>

prohibition along _those_ lines should certainly be a registry<br>

decision, IMO.<br>

<br>

For the record (and context when that discussion re-emerges on<br>

the list), at least some of the Greek IDN community would prefer<br>

that we preserve the IDNA2003 mapping / case-folding behavior<br>

for final sigma even if that is the only required mapping in<br>

IDNA2008.<br>

<div class="Ih2E3d"><br>

&gt;&gt; (8.c) Disallow conjoining Hangul jamo per recommendation from<br>

&gt;&gt; KRNIC and others, permitting only precomposed syllables.<br>

&gt;<br>

&gt; This appears to open the character-by-character decision<br>

&gt; making that we already ruled out. &nbsp;As Mark Davis argues, if we<br>

&gt; accept this restriction then we probably need to re-open the<br>

&gt; discussions about obsolete scripts, &amp;c. &nbsp;It sounds to me very<br>

&gt; like a registry policy.<br>

<br>

</div>Let me try to explain the other point of view, to the extent to<br>

which I understand the issues as they have been explained to me<br>

by the group associated with the Korean registry (if I have it<br>

wrong, I hope they will step in directly). &nbsp;I am going to try to<br>

write this so as to not be inflammatory. &nbsp;If I fail, I want to<br>

stress that being inflammatory is not my intent and ask<br>

forgiveness in advance.<br>

<br>

Unicode classifies characters in various ways using a collection<br>

of categories and properties. &nbsp;Those categories and properties<br>

(or at least the vast majority of them) were designed long<br>

before the IETF started thinking about IDNs; they were certainly<br>

not optimized for IDNA requirements. &nbsp;Given that, we should be<br>

grateful and pleasantly surprised that the properties work as<br>

well as they do for our purposes. &nbsp;On the other hand, we should<br>

not be surprised when, for some group of characters, they do<br>

not... and that has nothing to do with character by character<br>

decisions, at least as I understand that term.<br>

<br>

Before addressing the Hangul question, let me invent an example<br>

that is counterfactual, i.e., barring something unforeseen, we<br>

are unlikely to ever have to deal with it directly. &nbsp; There is a<br>

proposal pending for ISO/IEC JTC1/SC2/WG2 to add a number of<br>

annotation marks for Arabic. &nbsp;These marks are, according to the<br>

proposal (with confirmation from independent experts) used<br>

strictly for pedagogical purposes. &nbsp; Obviously, if one were<br>

going to transmit the instructional texts electronically in<br>

other than page image form, they have to have code points. &nbsp;They<br>

are identified in the proposal with General Category &quot;Sk&quot;<br>

(modifier symbols). &nbsp;With that classification, the rules in<br>

&quot;Tables&quot; would automatically place them in DISALLOWED. &nbsp;But<br>

suppose the proposal had identified them as modifier letters<br>

instead (I&#39;m told there is a case to be made for that, even<br>

though the relevant Unicode folks have --wisely from our point<br>

of view but perhaps not others-- decided otherwise). &nbsp;Then we<br>

would need to exclude them (the whole group, not<br>

character-by-character) as a backward-compatibility issue<br>

because otherwise, to quote a colleague, we would have a huge<br>

mess on our hands, with all sorts of equivalences failing.<br>

Again, this is _not_ an issue, but it may help in thinking about<br>

the Hangul problem.<br>

<br>

For Hangul, the individual Jamo (again, a clearly-identified<br>

group of characters, not a character-by-character decision) are<br>

used to construct conventional (and precomposed) characters<br>

(&quot;Hangul syllables&quot;). &nbsp;To the extent to which there is an<br>

analogy in Latin-based script, they would be combining<br>

characters that combine without a base character. &nbsp;For<br>

Latin-based scripts, we don&#39;t need to worry about conflicts<br>

between precomposed characters and composing (base+combining<br>

character) forms of the same characters because the NFC<br>

requirement deals with the problem. &nbsp; For Korean, there is no<br>

equivalent because NFC doesn&#39;t produce the relevant precomposed<br>

forms. &nbsp; And, because it doesn&#39;t, our problem is not one of<br>

confusing similarity (a registry problem) but one of having<br>

comparisons work correctly (a much deeper issue which we have<br>

generally dealt with in the protocol, in the analogous case by<br>

the requirement for NFC. &nbsp;If Unicode had assigned properties<br>

that treated the Syllables differently from the Jamo, we would<br>

simply build a rule using those categories and we would not be<br>

having a discussion about, e.g., &quot;character by character<br>

decisions&quot;. &nbsp;But there is apparently no such property --both the<br>

Jamo and the Syllables are in General Category &quot;Lo&quot; and the rest<br>

of the properties appear to match as well.<br>

<br>

I think the situation --and the comparison failures that would<br>

result if we don&#39;t deal with it-- makes a strong case for our<br>

disallowing either the Jamo or the Syllables. &nbsp;The ccTLD<br>

registry and local experts strongly prefer that we disallow the<br>

Jamo, even though it means that some archaic Syllables and<br>

fanciful forms are disallowed as a consequence. &nbsp; I think we<br>

just defer to them.<br>

<br>

Just my opinion, of course.<br>

<div class="Ih2E3d"><br>

&gt; The argument that some people will get<br>

&gt; that registry policy wrong has already been floated, and we<br>

&gt; rejected it. &nbsp;Indeed, if we don&#39;t reject that premise, then<br>

&gt; all of the local mapping approach that we&#39;ve taken should be<br>

&gt; tossed out, and we should go back to strict mapping in the<br>

&gt; protocol.<br>

<br>

</div>Again, the issue here is one of comparison failures, not of<br>

confusability or other registry policy questions.<br>

<font color="#888888"><br>

 &nbsp; &nbsp;john<br>

</font><div><div></div><div class="Wj3C7c"><br>

<br>

_______________________________________________<br>

Idna-update mailing list<br>

<a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>

<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>

</div></div></blockquote></div><br></div></div></div>