<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Thank you, YungJin Suh. This recommendation was accepted in IDNA2008 discussions. That suggests to me that for registration purposes, your recommendation would still apply. the new exploration of a mapping function, prior to looking up a domain name, might end up mapping out any Jamo characters appearing in a query.<div><br></div><div>the WG needs now to define more precisely what characters are mapped and into what other characters (or into "nothing").&nbsp;</div><div><br></div><div>There is also a question of when to apply such mappings prior to a query. One suggestion that is contained in the draft protocol document of IDNA2008 would perform an IDN2008-style lookup and if that failed, would then map the query under IDNA2003-like rules and do the lookup again. &nbsp;I use the term "IDNA2003-like" above only because of the possibility that the WG will conclude that the IDNA2008 mapping function is similar to but possibly excludes some of the characters mapped under IDNA2003 rules.&nbsp;</div><div><br></div><div>Vint<br><div apple-content-edited="true"> <span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div><div><br class="Apple-interchange-newline">Vint Cerf</div><div>Google</div><div>1818 Library Street, Suite 400</div><div>Reston, VA 20190</div><div>202-370-5637</div><div><a href="mailto:vint@google.com">vint@google.com</a></div><div><br></div></div></span><br class="Apple-interchange-newline"></span><br class="Apple-interchange-newline"></span> </div><br><div><div>On Mar 25, 2009, at 5:59 AM, YungJin Suh wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"> <div style="WORD-WRAP: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space"> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">Dear all WG members,</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"></span></font>&nbsp;</div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">About the JAMO characters in Korean, We still&nbsp;strongly recommend to disallow these characters.</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"></span></font>&nbsp;</div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">We want to allow&nbsp;ONLY Hangul Syllables(U+AC00 ~ U+D7A3)&nbsp;in&nbsp;revised IDNA.</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"></span></font>&nbsp;</div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">I&nbsp;attached&nbsp;the letter from&nbsp;Korean goverment. (I think some of you may had already&nbsp;read it&nbsp;.)</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"></span></font>&nbsp;</div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">I hope&nbsp;this document helps you to understand&nbsp;our situation.</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"></span></font>&nbsp;</div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">With regard to the local mapping draft,</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"></span></font>&nbsp;</div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">About&nbsp;'dots' as label separators, actually Korean doesn't have mapping problems. But Chinese and Japanese do. </span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">So I hope this problem will be solved in some way or other.</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"></span></font>&nbsp;</div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">And about 'Compatibility characters', we defined Korean IDN in the draft as following:</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; The term "Korean IDN" stands for "IDN consists from CJK scripts&nbsp;marked with 'Y' in 'K' column,&nbsp;which is Hangul Syllables(U+AC00 ~ U+D7A3),&nbsp;&nbsp;and LDH".&nbsp;&nbsp;</span></font><font face="굴림"><span class="996431108-25032009"> Permitted characters in&nbsp;Korean IDN are listed in [IANA-IDN-Language-ko-KR].</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">Eventhough in IDNA2003 allowed JAMO in Korean, we defined like this. </span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">Because if we restrict the range of allowed characters to only Hangul Syllables(U+AC00 ~ U+D7A3),&nbsp;nomalization is not an issue for&nbsp;Korean IDN anymore.</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"></span></font>&nbsp;</div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">Again, allowing only&nbsp;&nbsp;Hangul Syllables in IDNA is recommended.</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"></span></font>&nbsp;</div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">Thank you.</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"></span></font>&nbsp;</div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">With regards,</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"></span></font>&nbsp;</div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009">YungJin Suh</span></font></div> <div dir="ltr" align="left"><font face="굴림"><span class="996431108-25032009"> </span></font><div align="left"><font face="굴림"> <div><font face="Times New Roman" size="2"><span lang="EN-US" style="FONT-SIZE: 12pt"><span class="850041301-29012009">=======================</span></span></font></div> <div><font face="Times New Roman" size="2"><span lang="EN-US" style="FONT-SIZE: 12pt"><span class="850041301-29012009"></span></span></font><span lang="EN-US" style="FONT-SIZE: 12pt"><span class="850041301-29012009"><font face="Times New Roman" size="2">YungJin Suh</font></span></span></div> <div><span lang="EN-US" style="FONT-SIZE: 12pt"><span class="850041301-29012009"><font face="Times New Roman" size="2">Head of DNS section, KRNIC, NIDA</font></span></span></div> <div><font face="Times New Roman" color="#0000ff" size="2"><span lang="EN-US" style="FONT-SIZE: 12pt"><span class="850041301-29012009"><a href="blocked::mailto:yjsuh@nida.kr">yjsuh@nida.kr</a></span></span></font></div> <div><font face="Times New Roman" size="2"><span lang="EN-US" style="FONT-SIZE: 12pt"><span class="850041301-29012009">+82-2-2186-4562(O)</span></span></font></div> <div><font face="Times New Roman" size="2"><span lang="EN-US" style="FONT-SIZE: 12pt"><span class="850041301-29012009">+82-10-4820-8291(M)</span></span></font></div> </font><div><font face="굴림"><font face="Times New Roman" color="#000000" size="2"><span lang="EN-US" style="FONT-SIZE: 12pt"><span class="850041301-29012009"> </span></span></font></font><div><font face="굴림"><font face="Times New Roman" color="#000000" size="2"><span lang="EN-US" style="FONT-SIZE: 12pt"><span class="850041301-29012009">========================</span></span></font></font></div></div></div></div> <div dir="ltr" align="left"><font size="+0"><span class="996431108-25032009"><font face="굴림"><span class="h1"><strong></strong></span></font></span></font>&nbsp;</div><br> <div class="OutlookMessageHeader" lang="ko" dir="ltr" align="left"> <hr tabindex="-1"> <font face="Tahoma" size="2"><b>From:</b> idna-update-bounces@alvestrand.no [<a href="mailto:idna-update-bounces@alvestrand.no">mailto:idna-update-bounces@alvestrand.no</a>] <b>On Behalf Of </b>Vint Cerf<br><b>Sent:</b> Monday, March 23, 2009 5:31 AM<br><b>To:</b> Mark Davis<br><b>Cc:</b> <a href="mailto:idna-update@alvestrand.no">idna-update@alvestrand.no</a><br><b>Subject:</b> Re: DRAFT Status of Work on IDNA2008 + IDNAv2<br></font><br></div> <div></div>thanks Mark - I re-issued version 5 of the material so some of your comments have crossed in the mail. I will try to update once more to take into account your comments or at least annotate as reminders for discussion. <div><br></div> <div>v</div> <div><br> <div apple-content-edited="true"><span class="Apple-style-span" style="WORD-SPACING: 0px; FONT: 12px Helvetica; TEXT-TRANSFORM: none; COLOR: rgb(0,0,0); TEXT-INDENT: 0px; WHITE-SPACE: normal; LETTER-SPACING: normal; BORDER-COLLAPSE: separate; orphans: 2; widows: 2; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0"><span class="Apple-style-span" style="WORD-SPACING: 0px; FONT: 12px Helvetica; TEXT-TRANSFORM: none; COLOR: rgb(0,0,0); TEXT-INDENT: 0px; WHITE-SPACE: normal; LETTER-SPACING: normal; BORDER-COLLAPSE: separate; orphans: 2; widows: 2; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px"><span class="Apple-style-span" style="WORD-SPACING: 0px; FONT: 12px Helvetica; TEXT-TRANSFORM: none; COLOR: rgb(0,0,0); TEXT-INDENT: 0px; WHITE-SPACE: normal; LETTER-SPACING: normal; BORDER-COLLAPSE: separate; orphans: 2; widows: 2; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px"> <div> <div><br class="Apple-interchange-newline">Vint Cerf</div> <div>Google</div> <div>1818 Library Street, Suite 400</div> <div>Reston, VA 20190</div> <div>202-370-5637</div> <div><a href="mailto:vint@google.com">vint@google.com</a></div> <div><br></div></div></span><br class="Apple-interchange-newline"></span><br class="Apple-interchange-newline"></span></div><br> <div> <div>On Mar 22, 2009, at 3:51 PM, Mark Davis wrote:</div><br class="Apple-interchange-newline"> <blockquote type="cite">Here are comments on the status. (I tried to update to   the later doc, but because it was only distributed in pdf, I had to do it   manually, so I may have missed something.)   <div><br>  <div><br clear="all">Mark<br><br><br>  <div class="gmail_quote">On Fri, Mar 20, 2009 at 06:12, Vint Cerf <span dir="ltr">&lt;<a href="mailto:vint@google.com" target="_blank">vint@google.com</a>></span> wrote:<br>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">DRAFT     Status of work on IDNA2008<br><br>3/21/2009 0523 PDT<br><br><br>Vint     Cerf<br><br><br>This brief summary is intended to provide some focus for the     IDNABIS WG meetings<br>scheduled for Monday and Tuesday, March 23     (1740-1940) and March 24 (0900-1130).<br><br>One goal is to try to assess     rough consensus about the present documentation on the<br>presumption that     we are abiding by the ground-rules set forth in the charter of the     WG.<br>Another is to assess what the implications are for users, registries,     registrars if<br>IDNA2008 is adopted as it presently stands. &nbsp;A third     goal is to examine the implications<br>of the IDNAV2 proposal from Paul     Hoffman and contrast with adoption of IDNA2008.<br><br>I fully recognize     that consensus has to be assessed from mailing list exchanges, not<br>merely     from appearances at our face to face meetings.<br><br>The material presented     below is by no means intended to be more than a basis for<br>discussion, and     is not intended as a penultimate recommendation.</blockquote>  <div><br></div>  <div>I think it would also be useful to mark where each is different from   IDNA2003.&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br><br>Background<br><br><br></blockquote>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>Under     the IDNABIS charter, the IDNA2008 design as it now stands makes     several<br>specific assumptions or makes specific propositions to achieve a     number of goals:<br><br>0. Avoid dependence on any specific version of     Unicode through the use of rules<br>&nbsp; &nbsp;for determining PVALID     characters based on Unicode character properties</blockquote>  <div>&nbsp;</div>  <div>add: "as much as possible". Exceptions may be necessary in some cases   (and are included in the draft tables).</div>  <div><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>1.     No change to the deployed DNS server functionality (domain name labels     limited to<br>&nbsp; &nbsp;ASCII and case-insensitive matching   only)&nbsp;</blockquote>  <div>[no change from IDNA2003]&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>2.     Esszet, Final Sigma, ZWJ and ZWNJ, geresh and gershayim are PVALID     characters<br>&nbsp; &nbsp; some of which are treated through contextual     rules (there is still ongoing discussion<br>&nbsp; &nbsp; about the     implications of these choices)</blockquote>  <div><br>This is also a current feature of the drafts, but not required by the   charter.&nbsp;It is unclear whether this is actually consistent with the   charter or not. "This work is intended to specify an improved means to produce   and use stable and unambiguous IDN identifiers." Effectively, any IDN with the   first four characters is ambiguous between versions of IDNA in that it will   lead to different addresses.&nbsp;<br><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>3.     Unassigned Unicode characters will not be looked up</blockquote>  <div>&nbsp;</div>  <div>Just a comment (no change): IDNA2003 had the slightly different goal:   unassigned Unicode characters will not be returned from the DNS.&nbsp;</div>  <div><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>4.     No mapping of characters at least within the protocol   specification&nbsp;</blockquote>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>5.     No modification of or dependence on Nameprep &nbsp;(and thus no     impact<br>&nbsp; on other protocols relying on Nameprep or   Stringprep.)&nbsp;</blockquote>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>6.     Clear specification of valid "dot" form in a way that is consistent with     DNS<br>&nbsp; &nbsp; protocol requirements.</blockquote>  <div><br></div>  <div>IDNA2003 specified the dot form in a way that is consistent with DNS;   that is, it required no change of the DNS protocol, so this is no change. That   is, once in the ACE form, dots are dots.</div>  <div><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>7.     Symmetry between native-character ("Unicode") and ACE ("Punycode")<br>&nbsp;     &nbsp; forms of a label.</blockquote>  <div><br>This may be a goal, but it is not achieved by the current drafts.   There is a strong asymmetry between them in that in lookup, an implementation   need not check that what appears to be an A-Label is one, but it must check   that a U-Labels is one (mostly). (Comment: I believe that this should be a   goal: if it is important to check those requirements, then it is important to   test both A and U Labels; if it is not important to test them, then it should   not be a requirement for either one.)<br><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>8.     Conversion to an inclusion list of PVALID characters (as distinct from     the<br>&nbsp; &nbsp;IDNA2003 posture that excluded only a few Unicode     characters)&nbsp;</blockquote>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>9.     Improved terminology to make categories and types of labels more     clear.<br>&nbsp; &nbsp;(Definitions)&nbsp;</blockquote>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>10.     Provide explanation for decisions and their motivations (Rationale)     to<br>&nbsp; &nbsp; aid implementors, registries, registrants and users in     understanding IDNA.</blockquote>  <div>&nbsp;</div>  <div>Rationale doesn't really provide explanation for motivations in enough   detail to be useful. I'd recast this as: "Provide informative background   material (Rationale) to aid ..."</div>  <div><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>11.     Separately describe registration and lookup procedures to improve   clarity</blockquote>  <div>&nbsp;</div>  <div>The goal is good, but the current drafts don't meet the goal. Whether it   increases clarity or not is unclear, since by doing so makes it difficult to   determine what the similarities and differences are between the two processes.   So drop "to improve clarity". (A relatively small recasting of the text to   make it precisely parallel between them (including numbering), and point out   precisely where the differences are, would meet this goal.<br><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>12.     Specify tests to be applied at lookup time in an attempt to limit abuse     of<br>&nbsp; &nbsp; &nbsp; IDNA at all levels of registration</blockquote>  <div><br>That is not a change from IDNA2003. The tests are different, and are   expanded, but it is a quantitative difference, not qualitative. For example,   IDNA2003 did test bidi; we just think the IDNA2008 tests are better. And the   "in an attempt to limit abuse" is not true; the changes in IDNA2008 will have   a trifling effect on abuse at the very best, and introduce significant   opportunities for spoofing because of the 4 ambiguous characters. And   affecting the "phishing" problem is not a requirement of the charter. So this   item should be removed.<br>&nbsp;<br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>13.     Clarify what is expected of IDNA-aware applications and domain     name<br>&nbsp; &nbsp; &nbsp; "slots" with regard to invalid labels and     future extensibility</blockquote>  <div><br>These are still not nailed down in the current drafts. My   expectations are that once a domain name is valid, it remain valid for all   time -- that is, we are doing a one-time massive compatibility change, but   there will be no more changes that would affect compatibility. However, that   is not captured in the text, despite the charter requirement "This work is   intended to specify an improved means to produce and use stable and   unambiguous IDN identifiers."</div>  <div><br></div>  <div>Another major change is the introduction of a mechanism for changing   IDNAs on the fly via the context mechanism, with and associated   process.<br><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br><br><br>Chartering     and Re-Chartering<br><br>(1) A Re-charter is needed if we abandon a     significant fraction of the IDNA2008 goals<br>and methods. IDNAv2, as     described by Paul Hoffman requires a re-charter.<br><br>(2) A Re-charter is     needed if the WG decides to introduce mappings into the     IDNA2008<br>specifications since the basic assumption in IDNA2008 was that     mapping would not<br>be part of the specification.<br><br>(3) It is possible     that re-charter might not be needed if IDNA2008 adopts some<br>IDNA2003     operations under a restricted set of conditions and only at lookup<br>time     for purposes of easing the transition to IDNA2008. This would be up to     the<br>AD and IESG presumably to decide.<br><br>Basics for IDNA2003 and     IDNA2008<br><br>Both of these specifications use the Punycode algorithm to     generate what<br>IDNA2008 would call an A-label (ie. "xn-- &lt;LDH compliant     string>") from</blockquote>  <div><br></div>  <div>Better expressed as an XN label. That terminology can be applied to both,   while A-Label only makes sense for IDNA2008.&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>labels     expressed as a string of characters drawn from a subset of     Unicode<br>defined characters.<br><br>DNS matching is done in the servers by     comparing the query string to the<br>registered string in a case-independent     fashion. &nbsp;For IDNs, these comparisons<br>are done after conversion into     the "xn--" prefix form. For IDNs the case insensitive<br>matching of the DNS     servers applies only to the A-label form and not to the<br>Unicode form.     This means that the case-insensitive matching behavior of<br>in traditional     ASCII labels is not conferred on IDNs in their Unicode form.<br><br>The     case-insensitive comparisons between traditional LDH domain names     is<br>approximated under IDNA2003 by using CaseFold as a mapping guide on     the<br>Unicode strings being looked up. In addition, IDNA2003 also maps the     so-called<br>"compatibility" characters of Unicode into their counterparts.     The same actions</blockquote>  <div>&nbsp;</div>  <div>=> "compatibility decomposable" characters&nbsp;of Unicode into their   counterparts</div>  <div>[Not all compatibility characters are decomposable and vice versa.]</div>  <div>&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>precede     the registration of new domain names under IDNA2003.<br><br>Unicode CaseFold     maps to upper case and then map back to lower case.</blockquote>  <div>&nbsp;</div>  <div>This is not quite accurate; better would be to say "Unicode CaseFold maps   characters to&nbsp;lowercase values based on an an equivalence class formed by   including lowercase, uppercase, and titlecase mappings."</div>  <div><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>Prior     to Unicode 5.0, Ezsett became "SS" because there was no upper<br>case, then     became "ss" in the lower case mapping. &nbsp;Under Unicode 5.0<br>CaseFold     was unchanged for &nbsp;stability reasons. Consequently<br>CaseFold     (ESSZETT) is "ss" rather than lower case esszett even after<br>the     introduction of upper case ESSZETT in Unicode 5.0.</blockquote>  <div><br></div>  <div>=></div>  <div>The uppercase of ezsett in Unicode is "SS", following national standards   and practices. As of Unicode 5.1, an uppercase version of eszett became   available. Under the Unicode case folding, both map to "ss".</div>  <div><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br><br>Under     IDNA2003, both DISALLOWED and UNASSIGNED characters<br>are looked up. If     abusive registrations are made using DISALLOWED<br>or UNASSIGNED characters,     these registered domain names may be<br>be found on lookup by     IDNA2003-compliant clients.</blockquote>  <div><br></div>  <div>This is not correct, as Erik points out.</div>  <div>&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br><br>Under     IDNA2008, UNASSIGNED and DISALLOWED characters are not looked up.<br>If new     characters become defined under a new version of Unicode<br>an old client     will not look them up until it is updated. Abusive registrations<br>using     UNASSIGNED characters will not be looked up.<br><br>Script mixing is not     banned under IDNA2003. Under IDNA2008, BiDi<br>bans mixing of European and     Extended Arabic-Indic numbers with<br>Arabic numbers. &nbsp;That is AN and     EN characters may not be present in<br>the same label. Otherwise, mixing is     permitted in IDNA2008.<br><br>IMPLICATIONS OF ADOPTING IDNA2008 AS CURRENTLY     SPECIFIED<br><br><br>1. IDNA2008 is case sensitive for labels with non-LDH     characters in them but &nbsp;is</blockquote>  <div>... with at least one non-LDH character...&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>case-insensitive     for LDH characters<br><br>for example" buecher "is all ASCII and could be     matched with "Buecher" or "bUecher"<br>under IDNA2008<br><br>however     "B&lt;u-umlaut>cher" would not be allowed because Tables (see 4.2.2)     would<br>disallow Latin Capital letters. Some users accustomed to LDH-label     behavior<br>may be surprised that "B&lt;u-umlaut>cher" and     "b&lt;u-umlaut>cher" do not match.<br><br>On the other hand, the     symmetric relationship between the IDNA2008-defined<br>A-Label and U-Label     has the benefit one can use exact match for either<br>U-label form or     A-label forms since they are directly and unambiguously<br>transformable     into each other.</blockquote>  <div>&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">    <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">However,       this symmetry will not exist for</blockquote>    <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">cases       where the IDNA2003 A-Label and IDNA2008 A-label for the same</blockquote>    <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">U-Label       differ. [Query: will this be a material problem only for actual</blockquote>    <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">registrations       under IDNA2003 that differ in A-label form from   IDNA2008?]</blockquote></blockquote>  <div><br></div>  <div>For registries, this is an advantage (equivalent to disallowing mapping),   but it is not so clear that it is a "benefit" for lookup.</div>  <div><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">    <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"></blockquote><br><br><br>2.     IDNA2008 does not ban script mixing even within labels.<br><br>Attempts to     fashion rules along these lines have run into problems<br>in which     characters that may be confused for others are needed<br>to express strings     in particular languages. The International Phonetic<br>Alphabet (IPA)     characters are a case in point. Some are used for<br>certain (e.g. African)     languages but some of these characters<br>can be confused for others in the     Latin alphabet. Other examples<br>exist in Arabic, Cyrillic, Greek among     others.<br><br>Even in the absence of intra-label script mixing,     inter-script confusion<br>such as the Russian word for "restaurant" looking     like &nbsp;"pectopah" in<br>Latin characters is quite     possible.<br><br>Despite the apparent desirability of such a ban at protocol     level, there<br>are simply too many combinations of confusion within-scripts     and between<br>scripts to benefit significantly from a protocol-level ban.     On the other hand,<br>registry level constraints that may be more     script-aware appear to be<br>the most effective tool we have.</blockquote>  <div><br></div>  <div>I think client-level warnings are the most effective constraint. After   all, if we could always trust the registries, we would need *no* constraints   on the client side in the protocol.</div>  <div>&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br><br>3.     Esszet is permitted and its usage appears to be geographically and     language<br>specific. Under IDNA2003, this character is mapped into "ss". To     deal with the<br>potential conflict with previously mapped registrations in     which Esszet is mapped<br>to "ss" registries would need to appeal to     Rationale 7.2 options, for example,<br>to deal with this. Note that not all     collisions may be a consequence of mapping, i.e.,<br>many occurrences of     "ss" in German text are not typographic variations of<br>Esszett and very     few occurrences in Latin script, without consideration of language,<br>are     variations of Esszett either.<br><br>4. Final Sigma is permitted and raises     similar issues to Esszet with regard to<br>collisions and the same remedies     would apply.<br><br>5. ZWJ/ZWNJ<br><br>In IDNA2003, these characters were     mapped to "nothing". It has become apparent<br>however that some Indic     scripts need them. Persian registries currently<br>reject registration of     labels including ZWJ/ZWNJ although ZWNJ is used in<br>writing Persian     languages. Arabic language does not need ZWJ/ZWNJ.<br>Mapping to "nothing"     in INDA2003 has the side-<br>effect of inhibiting domain name expression in     some Indic scripts including<br>Tamil and Devanagari. Permitting either or     both as valid characters creates<br>a compatibility problem similar to the     Esszett one; i.e., one cannot tell<br>whether a DNS label, when converted     back to native character form, was<br>intended to be written with ZWJ, ZWNJ     or neither.<br><br>Elaboration: Suppose that "ab" is a string in one of the     scripts in which we now<br>propose to permit ZWNJ. &nbsp;All we have in the     DNS is the A-label equivalent of "ab".<br>We can't tell from looking at it     whether the starting string, as seen/preferred by the<br>registrant,     was<br>&nbsp;ab &nbsp; &nbsp;or<br>&nbsp;aZWJb<br>since both map to the same     A-label.<br><br>Under IDNA2008, if the user enters "ab", she gets one     A-label<br>while, if she enters "aXWJb", she gets a different     A-label.<br>That is exactly the same as the Eszett problem -- you can't     tell<br>from the IDNA2003 A-label what the original intention was and<br>use     of the string under IDNA2008 gets you a different A-label<br>than it does     under IDNA2003.<br><br>Joiner characters become invisible if inserted in     strings written in scripts<br>that do not use them. </blockquote>  <div>=> in&nbsp;strings where they make no visual difference. This included   scripts</div>that do not use them, and many positions in scripts that do use   them.   <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">Unicode     classifies these characters<br>as "COMMON" so they also end up passing any     plausible tests to prevent<br>mixing of scripts in a label. Contextual rules     are needed to restrict their use<br>to strings in scripts where they have     some effect. </blockquote>  <div><br></div>  <div>where they could have some effect (they won't always, and even when they   commonly have an effect, it depends on the font).</div>  <div>&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">We     end up relying on<br>registries to adopt their use judiciously within those     scripts. See also the<br>Rationale document for further     commentary.<br><br>6. Symbols and punctuation are NOT PVALID under IDNA2008     but are valid</blockquote>  <div>Most symbols...&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br>under     IDNA2003 leading to a variety of potential confusions with     "slash-like"<br>symbols or other symbols used in URIs for example. IDNA2008     rules reduce<br>confusion potential by making all characters with these     Unicode properties<br>invalid for use with Domain labels.</blockquote>  <div>either most, or add at the end "with certain exceptions"&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br><br>It     is not clear that such symbols are critically needed for domain     names.<br><br>Another reason for banning these characters is that they     complicate<br>references, discussions and databases (such as WHOIS) because     it is<br>not clear how to describe them in common, informal usage.<br>What     is the correct way to refer to "-" ? Is it "hyphen", "minus sign",     "hyphen-<br>minus" or "short middle horizontal bar?" And is "." "period",     "dot", "full stop",<br>or something else? What about "#" - is it "pound",     "hash", "number sign" or<br>"tic-tac-toe"? "Heart" is another example: which     one is it?</blockquote>  <div><br></div>  <div>Thus is just not an issue; there are thousands of letters that have   ambiguous or multiple names. This paragraph just can't be fixed; it needs   removal.</div>  <div>&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br><br>To     be fair, one could refer to the Unicode long name for the character or     even<br>the "U+" form although this sounds pretty awkward in practical     terms.<br><br>7. JAMO characters in Korean have been made Protocol Invalid     (DISALLOWED)<br>for reasons similar to (6) above. They introduce a     combinatorial explosion of different<br>string representations built from     JAMO primitive characters. They are valid<br>under IDNA2003.</blockquote>  <div><br></div>  <div>This is debatable. The only reasonable rationale we can include is that   they are only used in historic Korean.&nbsp;</div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br><br>8.     Under INDA 2008, when a new version of Unicode is released the     following<br>steps can be taken:<br><br>a. review of changes that might     require new rules in the IDNA2008 framework.<br>Such a conclusion would     assuredly require formation of a &nbsp;WG to facilitate \new     RFC<br>production. This is thought to be extremely unlikely to     happen.<br><br>b. A review of changes might only require exception rules to     preserve<br>compatibility. It is possible that the required changes might be     delegated<br>to an IANA action possibly in consultation with an expert     committee<br>to generate new tables.</blockquote>  <div><br></div>  <div>The current drafts require new RFCs in order to change the exception   tables, I believe. It would be better to change that to have the exception   table governed by the same process as the context tables (under stability   provisions).</div>  <div><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br><br>c.     Generate new tables for IANA registry (suitable for downloading as     needed<br><br>During the transition some will clients have the older tables     and<br>some registries the newer ones. Lookups of Domain Names     containing<br>new PVALID characters by older clients will fail under IDN2008     because<br>the client will reject UNASSIGNED characters until the clients     are updated<br>with the new PVALID characters.</blockquote>  <div><br></div>  <div>That is not the bad part of the transition. The bad part is that old   characters may transform from DISALLOWED to PVALID <span class="Apple-style-span" style="FONT-STYLE: italic">only during the   transition</span>, then corrected, or transform from PVALID to DISALLOWED only   during the transition, then corrected. And the correction period may be long,   depending on when software is updated. That is, if a program ships every two   years, and is updated during the correction, it will be wrong for 2   years.</div>  <div><br></div>  <blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid"><br></blockquote></div>...   </div></div></blockquote></div><br></div></div> <span>&lt;Comments for Korean IDN.pdf></span></blockquote></div><br></div></body></html>