Thanks. I&#39;m sorry I had to be a bit short in my previous comments; I had to have surgery shortly afterward and am still recuperating. I&#39;ll get back to this when I&#39;m feeling a bit more intelligent...<br><br>Mark

<br><br><div><span class="gmail_quote">On 3/1/07, <b class="gmail_sendername">John C Klensin</b> &lt;<a href="mailto:klensin@jck.com">klensin@jck.com</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

--On Tuesday, 27 February, 2007 10:40 -0800 Mark Davis<br>&lt;<a href="mailto:mark.davis@icu-project.org">mark.davis@icu-project.org</a>&gt; wrote:<br><br>&gt; Some very quick notes on<br>&gt; <a href="http://www.ietf.org/internet-drafts/draft-klensin">

http://www.ietf.org/internet-drafts/draft-klensin</a> &gt; -idnabis-issues-01.txt (I&#39;ll be out the rest of this week). Some equally quick responses...&nbsp;&nbsp;&quot;#&quot; denotes quotes from the I-D text, and &quot;&gt;&quot; denotes your comments.

<br><br># 2.1<br>#<br>#&nbsp;&nbsp;&nbsp;&nbsp;The registrant or user typically produces the request<br>#&nbsp;&nbsp;&nbsp;&nbsp;string by keyboard entry of a character sequence.&nbsp;&nbsp;That<br>#&nbsp;&nbsp;&nbsp;&nbsp;sequence is validated only on the basis of its displayed<br>#&nbsp;&nbsp;&nbsp;&nbsp;appearance, without knowledge of the character coding used

<br>#&nbsp;&nbsp;&nbsp;&nbsp;for its internal representation or other local details of<br>#&nbsp;&nbsp;&nbsp;&nbsp;the way the operating system processes it.<br><br>&gt; This makes it sound like software validates the sequence,<br>&gt; which is incorrect.<br><br>

I certainly did not read that into it and note that &quot;software&quot;<br>does not appear anywhere in the statement.&nbsp;&nbsp;The text was<br>intended to indicate that whatever validation is performed is<br>performed by the user or not at all.&nbsp;&nbsp;Suggestions for better

<br>wording would be appreciated, but see below.<br><br>&gt; No software validates input character<br>&gt; sequences on the basis of displayed appearance. If the user<br>&gt; might look at the sequence and &quot;validate&quot; it (although that

&gt; is odd phrasing, inspect might be better); there is also some &gt; &quot;validation&quot; in that the user is typing, and general knows &gt; what keys are hit -- the validation is only in the sense of &gt; verifying that the correct keys are hit.

And the notion of &quot;correctness&quot; in hitting keys, and in what is echoed on the screen (which may not be the same thing as the exact keys hit, of course), is very much a matter of user validation (or, if you prefer, inspection and approval).

<br><br><br># 2.3.&nbsp;&nbsp;Character Mappings<br>#<br>#&nbsp;&nbsp;&nbsp;&nbsp;NFKC [Unicode-UAX15] which converts compatibility<br>#&nbsp;&nbsp;&nbsp;&nbsp;characters to their base forms, resolves the different<br>#&nbsp;&nbsp;&nbsp;&nbsp;ways in which some characters can be represented in

#&nbsp;&nbsp;&nbsp;&nbsp;Unicode into a canonical form, and performs one-way case #&nbsp;&nbsp;&nbsp;&nbsp;mapping (partially simulating the query-time folding #&nbsp;&nbsp;&nbsp;&nbsp;operation that the DNS provides for ASCII strings). &gt; NFKC does not perform case mapping.

<br><br>Cut and paste error.&nbsp;&nbsp;Will be fixed -- my apologies.<br><br># <a href="http://3.2.1.2">3.2.1.2</a>.&nbsp;&nbsp;Conversion to Unicode<br><br>&gt; This is still a bit much, and is not substantiated with<br>&gt; examples.<br><br>

Your opinion is noted.&nbsp;&nbsp; Ultimately, we have different<br>perspectives on this and on the related topic of what is in<br>scope.&nbsp;&nbsp;Based on your earlier comments, you believe that almost<br>all systems these days are Unicode-based or, when they use

<br>a local CCS, use ones that were incorporated directly enough<br>into Unicode that precise mapping is trivial.&nbsp;&nbsp;We keep hearing<br>about CCSs based on 2022-switching, local adaptations for what<br>you consider as presentation forms, need to distinguish

<br>characters that you have unified, and so on.&nbsp;&nbsp; I suspect that<br>at least part of this has to do with hearing from different<br>parties based on their perceptions of what you/we are willing<br>to hear.<br><br>I don&#39;t want to be harsh about this or have discussion of it

<br>turn into a distraction, but there appears to be a perception in<br>some parts of the Internet and character-set-using community<br>that the Unicode Consortium is sufficiently confident about the<br>knowledge and skills of the core Unicode technical team that

<br>there is little point identifying problematic issues or<br>identifying critiques of Unicode to the Consortium generally or<br>to UTC in particular: at best, such comments are rejected or<br>ignored; at worst, they result in the individual or organization

<br>raising them being abused or set up in ways that are not<br>directly related to the substance of the concern but that permit<br>them to be dismissed as uninformed idiots.&nbsp;&nbsp;Whether the<br>perception is accurate or not, its existence might explain why

<br>we are hearing some complaints, and about some problems, that<br>aren&#39;t making it through to you or the UTC.<br><br><br># <a href="http://3.2.1.4">3.2.1.4</a>.&nbsp;&nbsp;Nameprep Mappings<br><br>&gt; There is reference elsewhere, but should be clear here, that

<br>&gt; if all characters cp such that NFKC(cp) != cp are removed,<br>&gt; then NFC can be used instead of NFKC.<br><br>As we have discussed in other contexts, at least some of us<br>with experience with poor-quality implementations of, or

<br>deliberate variations on, Internet protocols generally and IDNA<br>in particular are inclined to be extremely conservative about<br>some of these steps.&nbsp;&nbsp;If, in fact, we succeed in eliminating<br>all cp such that NFKC(cp) != cp, then NFC can be used instead

but use of NFKC is, at worst, harmless.&nbsp;&nbsp;Retaining NFKC provides robustness against future compatibility characters that, for one reason or another, are not eliminated by the tabulated &quot;Never&quot; rules.

<br><br>Of course, an argument against it and in favor of specifying<br>things in terms of NFC only would be the possibility of having<br>future characters introduced that UTC identifies as<br>compatibility characters for existing ones but that the relevant

language community considers distinct from them.&nbsp;&nbsp;If NFKC is used, those characters are effectively permanently banned.&nbsp;&nbsp;If only NFC is used, then those characters can be permitted by appropriate decisions about the &quot;IDN-permitted&quot; property.

<br>Thinking about that tradeoff is leading me to believe that the<br>document should reference NFC (watch for separate note to the<br>list).&nbsp;&nbsp;However, FWIW, any argument that NFC would provide the<br>potential for an additional check and flexibility in the use of

&quot;IDN-permitted&quot; would argue against having UTC make final decisions about the content of &quot;IDN-permitted&quot;. # 3.2.2.&nbsp;&nbsp;Flow Model for Domain Name Resolution (Lookup) &gt; This needs an example to help substantiate the claims.

In this case, and elsewhere, while examples might help clarify the text, it does not seem helpful to get into a battle of examples and counterexamples.&nbsp;&nbsp;Terms like &quot;substantiate&quot; seem to imply an invitation to the latter.

<br><br># <a href="http://3.2.2.3">3.2.2.3</a>.&nbsp;&nbsp;User Interface Character Changes<br><br>&gt; I find the MAY here quite troublesome for backwards<br>&gt; compatibility. If a webpage right now has &lt;a<br>&gt; href=&quot;

<a href="http://Bücher.de">http://Bücher.de</a>&quot;&gt;... then any IDN compliant browser<br>&gt; will work correctly. With the proposed change, it may or may<br>&gt; not fail, depending on the brower (or other interpreter of

<br>&gt; the HTML). I am less concerned by compatiblity (NFKC)<br>&gt; variants not mapping, simply because of their frequency of<br>&gt; use, but case changes are not uncommon. We really need to see<br>&gt; evidence that this will not cause problems before we make

<br>&gt; case mapping a MAY.<br><br>On the other hand, we received a good deal of input from parts<br>of the browser vendor community that they wanted to be able to<br>treat cases in which, if the domain name in either the text

<br>associated with a link or the link itself was different from<br>the name that actually appeared in the DNS, they wanted to<br>treat the name as suspicious.&nbsp;&nbsp;That position was reinforced by<br>input from you and your colleagues that strings that were all

<br>lower case were less subject to spoofing than strings that<br>contained upper-case characters.&nbsp;&nbsp;So we are in a difficult<br>position, one in which<br>&nbsp;&nbsp;&lt;a href=&quot;<a href="http://bücher.de">http://bücher.de</a>&quot;&gt;

<br>is likely, at least in those browers and other applications<br>that imitate them, to be displayed to the user that way if the<br>link is displayed, while<br>&nbsp;&nbsp;&lt;a href=&quot;<a href="http://Bücher.de">http://Bücher.de

</a>&quot;&gt;<br>is likely to be displayed as<br>&nbsp;&nbsp;<a href="http://xn--bcher-kva.de/">http://xn--bcher-kva.de/</a><br>This gets even worse if the contxt for that link is, e.g.,<br>&nbsp;&nbsp;... please click <a href="http://Bücher.de">

http://Bücher.de</a><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;a href=&quot;<a href="http://Bücher.de">http://Bücher.de</a>&quot;&gt; ...<br>which may result in Punycode display or a nasty warning pop-up<br>whether the link is written in terms of <a href="http://Bücher.de">

Bücher.de</a> or <a href="http://bücher.de">bücher.de</a>,<br>since only &quot;bücher&quot; can actually be a U-label.<br><br>There is nothing that makes any of these UI behaviors<br>non-conforming (unless one interprets the standards such that

_any_ display of Punycode in a Unicode-capable environment is non-conforming), so the browsers are IDN-compliant.&nbsp;&nbsp;But, from a user point of view, Punycode display certainly defeats the intent of having IDNs.

<br><br>Note also that exactly the same arguments apply to the use of<br>compatibility characters in IRIs.&nbsp;&nbsp;Athough we can quibble about<br>the intent of those doing so, we know that some of them are used<br>today.&nbsp;&nbsp; So there is some risk that this will constitute an

<br>incompatible and surprising change.&nbsp;&nbsp;Making UIs take<br>responsibility for case-mapping (or compatibility character<br>mapping) and their consequences (and for being consistent about<br>how they are handled) and keeping upper-case characters out of

<br>IRIs will actually yield better interoperability and<br>compatibility in the long term than having in-protocol<br>requirements about case-mapping.<br><br>The problem discussed in 8.2 reinforces this point, even though

<br>you apparently disagree with it also.<br><br># 6.1.&nbsp;&nbsp;Display and Network Order<br>#<br>#&nbsp;&nbsp;&nbsp;&nbsp;Questions remain about protocol constraints implying that<br>#&nbsp;&nbsp;&nbsp;&nbsp;the overall<br><br>&gt; This is all out of scope; if present, it should probably be

&gt; in an appendix (and needs some work). What makes it out of scope, Mark?&nbsp;&nbsp;Users of IDNs in context believe that it is an issue and that puts it in scope, even if it were the case that nothing can be done about it.

<br><br># 6.2.&nbsp;&nbsp;The Ligature and Digraph Problem<br><br>&gt; ditto. Also, the definition and usage of ligature, digraph,<br>&gt; phoneme needs considerable work.<br><br>See above.&nbsp;&nbsp;If you have specific suggestions about improvements

<br>to those definitions, please make them.&nbsp;&nbsp;However, note that we<br>have been led to understand that the definitions used are<br>appreciably closer to those traditionally used in linguistics<br>and the study of typography and writing systems over the

<br>centuries than the subset of them that appear in the Unicode<br>Standard.<br><br># 7.&nbsp;&nbsp;IDNs and the Robustness Principle<br><br>#&nbsp;&nbsp; Registries, registrars, or other actors who do not do so,<br>#&nbsp;&nbsp; or who get too liberal, too greedy, or too weird may

#&nbsp;&nbsp; deserve punishment that will primarily be meted out in the #&nbsp;&nbsp; marketplace or by consumer protection rules and #&nbsp;&nbsp; legislation. &gt; This language seems inappropriate; and examples need to be &gt; provided.

<br><br>See comments above about examples.&nbsp;&nbsp;As far as appropriateness<br>of language is concerned, this language is generally consistent<br>with language used in discussions about &quot;enforcement&quot; of<br>Internet Standards in contexts in which there is no

<br>regulation-based enforcement mechanisms.&nbsp;&nbsp; Again, if you have<br>specific alternate suggestions, please make them.<br><br># 8.1.&nbsp;&nbsp;Design Criteria<br><br>#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *&nbsp;&nbsp;Characters that are unassigned in the version of<br>#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Unicode being

<br>#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;used by the registry or application are not<br>#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;permitted, even on resolution (lookup).<br><br>&gt; This needs better justification, with examples.<br><br>&gt; There is a general problem with the lack of substantiation,

<br>&gt; at least examples of perceived problems motivating the<br>&gt; changes.<br><br>See above.<br><br># 8.2.&nbsp;&nbsp;More Flexibility in User Agents<br><br>#&nbsp;&nbsp; For example, an essential element of the ASCII<br>#&nbsp;&nbsp; case-mapping functions, that uppercase(character) =

<br>#&nbsp;&nbsp; uppercase(lowercase(character)),<br><br>&gt; Replace character by string, and you see that this is false<br>&gt; for ASCII (and it is not clear what the relevance is).<br><br>I have made that replacement and, if it is not true for ASCII

even after it, I&#39;m missing something very fundamental.&nbsp;&nbsp;Unless I have somehow mis-stated the condition, it is essential to the matching rules of the DNS, so, if there is a flaw, it hasn&#39;t been obvious.

<br><br>&gt; (I ran out of time, and will try to get to this next week.)<br><br>I look forward to your additional comments.<br><br>regards,<br>&nbsp;&nbsp; john<br><br></blockquote></div><br><br clear="all"><br>-- <br>Mark