Standards and localization (was Dot-mapping)

Sat Dec 8 20:28:57 CET 2007

I'm a bit puzzled. If I take a "raw" IDN, like

http://Bücher.com

and paste it into an IDNA unaware browser, it won't work. We should expect
that of browsers that doesn't handle IDN. We'd need to paste in a punycode
version to work: xn--bcher-kva.com

If I take a "raw" IDN, like

http://Buecher．com               // that dot is a full-width dot

and paste it into an IDNA unaware browser, it also won't work. We should
also expect that of browsers that doesn't handle IDN. We'd need to paste in
a normalized version to work: http://Buecher.com

That is, it doesn't appear that the dot conversion is much different than
the punycode conversion (and case/normalization folding) -- something that
has to be done before passing off to DNS for it to work correctly.

Mark

On Dec 8, 2007 5:15 AM, John C Klensin <klensin at jck.com> wrote:

>
>
> --On Saturday, 08 December, 2007 12:06 +0800 YAO Jiankang
> <yaojk at cnnic.cn> wrote:
>
> >> Without that mapping, the string cannot be parsed into labels
> >> since conventional (legacy) FQDN parsers separate labels
> >> _only_ on ASCII period, 0x2E, aka U+002E.
> >
> > true. non IDNA-aware software  can  not parse  IDN.
> >
> >>
> >> Not being able to parse the string into labels would result in
> >> rather serious lookup failures,  but the problem is even worse
> >> because:
> >
> > if I understand it correctly,
> > it seems that you have the following assumption:
> > The domain name with the dot of (ideographic full
> >  stop), U+FF0E (fullwidth full stop), or U+FF61 (halfwidth
> >  ideographic full stop) is not IDN. so this domain will be
> > sent to DNS lookup server without IDNA process. actually,
> > according to RFC3490, it is IDN.
> > Since it is IDN, it must be dealt with IDNA before being sent
> > to DNS lookup. if that happens, there have not the problem as
> > you said.
>
> That is not my assumption.  Perhaps I can explain this better by
> means of an example.   I can't do this exactly, so suppose that
> the character "?" is actually U+3002 (ideographic full stop).
>
> Someone sends me a URL in email.  The URL consists of
>
>  http://www.xn--0xaat.example.com/
>
> where the A-label corresponds to the U-label φοο.
>
> That example uses standard dots.   Suppose I do not have an
> IDNA-aware browser.   But I can take the string from your mail,
> paste it in, parse it into
>  "www", "xn--0xaat", "example", and "com",
> look things up, and obtain the page.   That is how IDNA is
> supposed to work.   As long as the user sticks to passing the
> ACE form around, applications do not need to be IDNA-aware.
>
> However, assume that you send me a URL, that looks (substituting
> "?" as above) like:
>
>  http://www.xn--0xaat?example.com/
>
> I copy that out and paste it into my browser, which we are still
> assuming is not IDNA-aware.  Because the browser is not
> IDNA-aware, the domain name is parsed into
>
>   "www?xn--0xaat", "example" and "com"
>
> This is obviously wrong and will obviously result in a failure
> to find the name in a query.   Worse, that parsing is performed
> in places and with software other than DNS resolvers.  For
> example, there are several security-related protocols that use
> DNS names as identifiers but keep them in internal DNS form (a
> list of labels stored with lengths and values, not separated by
> dots).   Depending on how they are designed, even modern
> implementations are not required to be IDNA-aware (because IDNA
> is transparent).  But the dot-mappings cannot be transparent:
> every system, module, or application that has to parse an FQDN
> into components must know what is, and is not, a
> label-separation character.
>
>    john
>
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>

-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20071208/63c403b4/attachment-0001.html