I-D ACTION:draft-klensin-idnabis-issues-01.txt

Kenneth Whistler kenw at sybase.com
Sat Mar 10 01:44:33 CET 2007


John,

> # 8.2.  More Flexibility in User Agents
> 
> #   For example, an essential element of the ASCII
> #   case-mapping functions, that uppercase(character) =
> #   uppercase(lowercase(character)),
> 
> > Replace character by string, and you see that this is false
> > for ASCII (and it is not clear what the relevance is).
> 
> I have made that replacement and, if it is not true for ASCII
> even after it, I'm missing something very fundamental.

O.k., exegesis follows. When Mark says, "replace character
by string", he means, consider the following text:

   For example, an essential element of the ASCII
   case-mapping functions, that uppercase(string) =
   uppercase(lowercase(string)), ...
   
But I think he misread that as implying roundtripping of
casing -- which clearly is not the case for ASCII strings.

In other words:
 
   uppercase(lowercase(C)) = C  is true for ASCII
   uppercase(lowercase(S)) = S  is false for ASCII

I agree with you, however, that for ASCII-only, the
statement as stands would be true: in other words, it
doesn't matter if you uppercase a string or uppercase
the lowercase of a string -- you end up with the same
result either way.

>  Unless
> I have somehow mis-stated the condition, it is essential to
> the matching rules of the DNS, so, if there is a flaw, it
> hasn't been obvious.

However, that said, I agree with the thrust of Mark's comment
that it isn't clear what the relevance is in this section.
In fact, I find the entire paragraph in the draft
obscure:

   As suggested earlier in this section, it appears to be desirable to
   do as little character mapping as possible consistent with having
   Unicode work correctly (e.g., NFC mapping to resolve different
   codings for the same character is still necessary) and to make the
   mapping between A-labels and U-labels idempotent.  Case-mapping is
   not an exception to this principle: if only lower case characters can
   be registered in the DNS (i.e., present in a U-label), then IDNA200x
   should prohibit upper-case characters as input.  Some other
   considerations reinforce this conclusion.  For example, an essential
   element of the ASCII case-mapping functions, that
   uppercase(character) = uppercase(lowercase(character)), may not be
   satisfied with IDNs: the relationship may even be language-dependent.
   Of course, the expectations of users who are accustomed to a case-
   insensitive DNS environment will probably be well-served if user
   agents perform case mapping prior to IDNA processing, but the IDNA
   procedures themselves should neither require such mapping nor expect
   it when it isn't natural to the localized environment.

This is intended in the draft to serve as justification for
not doing casefolding as part of IDNAprep (or whatever the
process is called), and, in keeping with the title of the
section, presumably, arguing that user agents should be
flexible in their handling of casing of IDNs.

But I have been reading, re-reading, and re-re-reading, and
my conclusion is that this comes down to essentially:

   As suggested earlier in this section, it appears to be desirable to
   do as little character mapping as possible. mumbo jumbo mumbo
   jumbo. Case-mapping is character mapping. mumbo jumbo mumbo
   jumbo. Therefore, IDNA procedures themselves should not
   require case-mapping. User agents can take care of it.
   
Now maybe the intent here is to keep the text hard to interpret,
I don't know. But even the advice doesn't seem well-structured.
Here is a crack at rewriting the text to do this better:

==================================================================

  As suggested earlier in this section, it appears to be desirable
  to do as little character mapping as possible in the IDNA
  procedures themselves. Some character mapping is required
  to ensure that the procedures mapping A-labels to U-labels
  and back are idempotent, and to ensure that canonical
  equivalence requirements for the use of Unicode itself are
  followed (e.g., NFC normalization of input), but other character
  mapping should be avoided.
  
  With regards to case folding, the situation is as follows.
  If only lowercase letters can be registered in the DNS (i.e.,
  be present in a U-label), then the character mappings implied
  by case folding can be avoided in the IDNA procedures by
  simply prohibiting uppercase letters as input. This keeps
  the IDNA procedures simpler, but at the cost of requiring
  some greater degree of flexibility in user agents.
  
  [[ Note: remember that "more flexibility in user agents" is
  nominally the topic of this section! ]]
  
  The expectations of users who are accustomed to a case-insensitive
  DNS environment will probably be well-served if user agents
  perform case folding (to lowercase) prior to IDNA processing,
  even though the IDNA procedures themselves should neither
  require nor expect such mappings. And due caution is in
  order. It is not advisable to perform language-specific
  case mappings on IDNs, as this potentially could result in
  different resolutions for the same input. For example, the
  string "III", if lowercased by Turkish casing rules, would
  result in a different U-label than if lowercased by English
  casing rules.
 
===================================================================

I think something like that is much clearer. Note that it
doesn't change the recommendation you are trying to make
in idnabis-issues, namely that IDNA should not do casefolding,
but leave any casefolding to the user agents before they
call the IDNA procedures to do IDN resolving. However, I
really think the recommendation on flexibility for the
user agents needs the caution about this spelled out.

Instead of vague text that "the [casing] relationship may even
be language-dependent" as putatively contributing to the
argument to keep casefolding out of IDNA procedures (which
doesn't hold any water), the *real* issue here is that
if IDNA procedures don't do language-*in*dependent casefolding
as a matter of course, and you leave this up to user agents,
then you need to caution them *not* to go down the garden
path of applying language-dependent casefolding to URI's
before doing domain name resolution, or you are opening
yourself up for another entire class of spoofings and
incomprehensible label behavior, where the exact same
input string resolves or not (or worse, resolves *differently*)
depending on a localization setting in a browser.

--Ken



More information about the Idna-update mailing list