Standards and localization (was Dot-mapping)

John C Klensin klensin at jck.com
Sat Dec 8 14:15:45 CET 2007



--On Saturday, 08 December, 2007 12:06 +0800 YAO Jiankang
<yaojk at cnnic.cn> wrote:

>> Without that mapping, the string cannot be parsed into labels
>> since conventional (legacy) FQDN parsers separate labels
>> _only_ on ASCII period, 0x2E, aka U+002E.
> 
> true. non IDNA-aware software  can  not parse  IDN.
> 
>> 
>> Not being able to parse the string into labels would result in
>> rather serious lookup failures,  but the problem is even worse
>> because:
> 
> if I understand it correctly,
> it seems that you have the following assumption:
> The domain name with the dot of (ideographic full
>  stop), U+FF0E (fullwidth full stop), or U+FF61 (halfwidth
>  ideographic full stop) is not IDN. so this domain will be
> sent to DNS lookup server without IDNA process. actually,
> according to RFC3490, it is IDN.
> Since it is IDN, it must be dealt with IDNA before being sent
> to DNS lookup. if that happens, there have not the problem as
> you said.

That is not my assumption.  Perhaps I can explain this better by
means of an example.   I can't do this exactly, so suppose that
the character "?" is actually U+3002 (ideographic full stop).  

Someone sends me a URL in email.  The URL consists of

  http://www.xn--0xaat.example.com/

where the A-label corresponds to the U-label φοο.

That example uses standard dots.   Suppose I do not have an
IDNA-aware browser.   But I can take the string from your mail,
paste it in, parse it into
  "www", "xn--0xaat", "example", and "com", 
look things up, and obtain the page.   That is how IDNA is
supposed to work.   As long as the user sticks to passing the
ACE form around, applications do not need to be IDNA-aware.

However, assume that you send me a URL, that looks (substituting
"?" as above) like:

  http://www.xn--0xaat?example.com/

I copy that out and paste it into my browser, which we are still
assuming is not IDNA-aware.  Because the browser is not
IDNA-aware, the domain name is parsed into
   
   "www?xn--0xaat", "example" and "com"

This is obviously wrong and will obviously result in a failure
to find the name in a query.   Worse, that parsing is performed
in places and with software other than DNS resolvers.  For
example, there are several security-related protocols that use
DNS names as identifiers but keep them in internal DNS form (a
list of labels stored with lengths and values, not separated by
dots).   Depending on how they are designed, even modern
implementations are not required to be IDNA-aware (because IDNA
is transparent).  But the dot-mappings cannot be transparent:
every system, module, or application that has to parse an FQDN
into components must know what is, and is not, a
label-separation character.

    john

   





More information about the Idna-update mailing list