I-D ACTION:draft-klensin-idnabis-issues-01.txt

Mark Davis mark.davis at icu-project.org
Fri Mar 2 00:21:22 CET 2007


Thanks. I'm sorry I had to be a bit short in my previous comments; I had to
have surgery shortly afterward and am still recuperating. I'll get back to
this when I'm feeling a bit more intelligent...

Mark

On 3/1/07, John C Klensin <klensin at jck.com> wrote:
>
> --On Tuesday, 27 February, 2007 10:40 -0800 Mark Davis
> <mark.davis at icu-project.org> wrote:
>
> > Some very quick notes on
> > http://www.ietf.org/internet-drafts/draft-klensin
> > -idnabis-issues-01.txt (I'll be out the rest of this week).
>
> Some equally quick responses...  "#" denotes quotes from the
> I-D text, and ">" denotes your comments.
>
> # 2.1
> #
> #    The registrant or user typically produces the request
> #    string by keyboard entry of a character sequence.  That
> #    sequence is validated only on the basis of its displayed
> #    appearance, without knowledge of the character coding used
> #    for its internal representation or other local details of
> #    the way the operating system processes it.
>
> > This makes it sound like software validates the sequence,
> > which is incorrect.
>
> I certainly did not read that into it and note that "software"
> does not appear anywhere in the statement.  The text was
> intended to indicate that whatever validation is performed is
> performed by the user or not at all.  Suggestions for better
> wording would be appreciated, but see below.
>
> > No software validates input character
> > sequences on the basis of displayed appearance. If the user
> > might look at the sequence and "validate" it (although that
> > is odd phrasing, inspect might be better); there is also some
> > "validation" in that the user is typing, and general knows
> > what keys are hit -- the validation is only in the sense of
> > verifying that the correct keys are hit.
>
> And the notion of "correctness" in hitting keys, and in what is
> echoed on the screen (which may not be the same thing as the
> exact keys hit, of course), is very much a matter of user
> validation (or, if you prefer, inspection and approval).
>
>
> # 2.3.  Character Mappings
> #
> #    NFKC [Unicode-UAX15] which converts compatibility
> #    characters to their base forms, resolves the different
> #    ways in which some characters can be represented in
> #    Unicode into a canonical form, and performs one-way case
> #    mapping (partially simulating the query-time folding
> #    operation that the DNS provides for ASCII strings).
>
> > NFKC does not perform case mapping.
>
> Cut and paste error.  Will be fixed -- my apologies.
>
> # 3.2.1.2.  Conversion to Unicode
>
> > This is still a bit much, and is not substantiated with
> > examples.
>
> Your opinion is noted.   Ultimately, we have different
> perspectives on this and on the related topic of what is in
> scope.  Based on your earlier comments, you believe that almost
> all systems these days are Unicode-based or, when they use
> a local CCS, use ones that were incorporated directly enough
> into Unicode that precise mapping is trivial.  We keep hearing
> about CCSs based on 2022-switching, local adaptations for what
> you consider as presentation forms, need to distinguish
> characters that you have unified, and so on.   I suspect that
> at least part of this has to do with hearing from different
> parties based on their perceptions of what you/we are willing
> to hear.
>
> I don't want to be harsh about this or have discussion of it
> turn into a distraction, but there appears to be a perception in
> some parts of the Internet and character-set-using community
> that the Unicode Consortium is sufficiently confident about the
> knowledge and skills of the core Unicode technical team that
> there is little point identifying problematic issues or
> identifying critiques of Unicode to the Consortium generally or
> to UTC in particular: at best, such comments are rejected or
> ignored; at worst, they result in the individual or organization
> raising them being abused or set up in ways that are not
> directly related to the substance of the concern but that permit
> them to be dismissed as uninformed idiots.  Whether the
> perception is accurate or not, its existence might explain why
> we are hearing some complaints, and about some problems, that
> aren't making it through to you or the UTC.
>
>
> # 3.2.1.4.  Nameprep Mappings
>
> > There is reference elsewhere, but should be clear here, that
> > if all characters cp such that NFKC(cp) != cp are removed,
> > then NFC can be used instead of NFKC.
>
> As we have discussed in other contexts, at least some of us
> with experience with poor-quality implementations of, or
> deliberate variations on, Internet protocols generally and IDNA
> in particular are inclined to be extremely conservative about
> some of these steps.  If, in fact, we succeed in eliminating
> all cp such that NFKC(cp) != cp, then NFC can be used instead
> but use of NFKC is, at worst, harmless.  Retaining NFKC
> provides robustness against future compatibility characters
> that, for one reason or another, are not eliminated by the
> tabulated "Never" rules.
>
> Of course, an argument against it and in favor of specifying
> things in terms of NFC only would be the possibility of having
> future characters introduced that UTC identifies as
> compatibility characters for existing ones but that the relevant
> language community considers distinct from them.  If NFKC is
> used, those characters are effectively permanently banned.  If
> only NFC is used, then those characters can be permitted by
> appropriate decisions about the "IDN-permitted" property.
> Thinking about that tradeoff is leading me to believe that the
> document should reference NFC (watch for separate note to the
> list).  However, FWIW, any argument that NFC would provide the
> potential for an additional check and flexibility in the use of
> "IDN-permitted" would argue against having UTC make final
> decisions about the content of "IDN-permitted".
>
> # 3.2.2.  Flow Model for Domain Name Resolution (Lookup)
>
> > This needs an example to help substantiate the claims.
>
> In this case, and elsewhere, while examples might help clarify
> the text, it does not seem helpful to get into a battle of
> examples and counterexamples.  Terms like "substantiate" seem
> to imply an invitation to the latter.
>
> # 3.2.2.3.  User Interface Character Changes
>
> > I find the MAY here quite troublesome for backwards
> > compatibility. If a webpage right now has <a
> > href="http://Bücher.de">... then any IDN compliant browser
> > will work correctly. With the proposed change, it may or may
> > not fail, depending on the brower (or other interpreter of
> > the HTML). I am less concerned by compatiblity (NFKC)
> > variants not mapping, simply because of their frequency of
> > use, but case changes are not uncommon. We really need to see
> > evidence that this will not cause problems before we make
> > case mapping a MAY.
>
> On the other hand, we received a good deal of input from parts
> of the browser vendor community that they wanted to be able to
> treat cases in which, if the domain name in either the text
> associated with a link or the link itself was different from
> the name that actually appeared in the DNS, they wanted to
> treat the name as suspicious.  That position was reinforced by
> input from you and your colleagues that strings that were all
> lower case were less subject to spoofing than strings that
> contained upper-case characters.  So we are in a difficult
> position, one in which
>   <a href="http://bücher.de">
> is likely, at least in those browers and other applications
> that imitate them, to be displayed to the user that way if the
> link is displayed, while
>   <a href="http://Bücher.de">
> is likely to be displayed as
>   http://xn--bcher-kva.de/
> This gets even worse if the contxt for that link is, e.g.,
>   ... please click http://Bücher.de
>       <a href="http://Bücher.de"> ...
> which may result in Punycode display or a nasty warning pop-up
> whether the link is written in terms of Bücher.de or bücher.de,
> since only "bücher" can actually be a U-label.
>
> There is nothing that makes any of these UI behaviors
> non-conforming (unless one interprets the standards such that
> _any_ display of Punycode in a Unicode-capable environment is
> non-conforming), so the browsers are IDN-compliant.  But, from
> a user point of view, Punycode display certainly defeats the
> intent of having IDNs.
>
> Note also that exactly the same arguments apply to the use of
> compatibility characters in IRIs.  Athough we can quibble about
> the intent of those doing so, we know that some of them are used
> today.   So there is some risk that this will constitute an
> incompatible and surprising change.  Making UIs take
> responsibility for case-mapping (or compatibility character
> mapping) and their consequences (and for being consistent about
> how they are handled) and keeping upper-case characters out of
> IRIs will actually yield better interoperability and
> compatibility in the long term than having in-protocol
> requirements about case-mapping.
>
> The problem discussed in 8.2 reinforces this point, even though
> you apparently disagree with it also.
>
> # 6.1.  Display and Network Order
> #
> #    Questions remain about protocol constraints implying that
> #    the overall
>
> > This is all out of scope; if present, it should probably be
> > in an appendix (and needs some work).
>
> What makes it out of scope, Mark?  Users of IDNs in context
> believe that it is an issue and that puts it in scope, even if
> it were the case that nothing can be done about it.
>
> # 6.2.  The Ligature and Digraph Problem
>
> > ditto. Also, the definition and usage of ligature, digraph,
> > phoneme needs considerable work.
>
> See above.  If you have specific suggestions about improvements
> to those definitions, please make them.  However, note that we
> have been led to understand that the definitions used are
> appreciably closer to those traditionally used in linguistics
> and the study of typography and writing systems over the
> centuries than the subset of them that appear in the Unicode
> Standard.
>
> # 7.  IDNs and the Robustness Principle
>
> #   Registries, registrars, or other actors who do not do so,
> #   or who get too liberal, too greedy, or too weird may
> #   deserve punishment that will primarily be meted out in the
> #   marketplace or by consumer protection rules and
> #   legislation.
>
> > This language seems inappropriate; and examples need to be
> > provided.
>
> See comments above about examples.  As far as appropriateness
> of language is concerned, this language is generally consistent
> with language used in discussions about "enforcement" of
> Internet Standards in contexts in which there is no
> regulation-based enforcement mechanisms.   Again, if you have
> specific alternate suggestions, please make them.
>
> # 8.1.  Design Criteria
>
> #       *  Characters that are unassigned in the version of
> #          Unicode being
> #          used by the registry or application are not
> #          permitted, even on resolution (lookup).
>
> > This needs better justification, with examples.
>
> > There is a general problem with the lack of substantiation,
> > at least examples of perceived problems motivating the
> > changes.
>
> See above.
>
> # 8.2.  More Flexibility in User Agents
>
> #   For example, an essential element of the ASCII
> #   case-mapping functions, that uppercase(character) =
> #   uppercase(lowercase(character)),
>
> > Replace character by string, and you see that this is false
> > for ASCII (and it is not clear what the relevance is).
>
> I have made that replacement and, if it is not true for ASCII
> even after it, I'm missing something very fundamental.  Unless
> I have somehow mis-stated the condition, it is essential to
> the matching rules of the DNS, so, if there is a flaw, it
> hasn't been obvious.
>
> > (I ran out of time, and will try to get to this next week.)
>
> I look forward to your additional comments.
>
> regards,
>    john
>
>


-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070301/3cdda74c/attachment-0001.html


More information about the Idna-update mailing list