Standards and localization (was Dot-mapping)
mark.davis at icu-project.org
Sat Dec 8 20:28:57 CET 2007
I'm a bit puzzled. If I take a "raw" IDN, like
and paste it into an IDNA unaware browser, it won't work. We should expect
that of browsers that doesn't handle IDN. We'd need to paste in a punycode
version to work: xn--bcher-kva.com
If I take a "raw" IDN, like
http://Buecher．com // that dot is a full-width dot
and paste it into an IDNA unaware browser, it also won't work. We should
also expect that of browsers that doesn't handle IDN. We'd need to paste in
a normalized version to work: http://Buecher.com
That is, it doesn't appear that the dot conversion is much different than
the punycode conversion (and case/normalization folding) -- something that
has to be done before passing off to DNS for it to work correctly.
On Dec 8, 2007 5:15 AM, John C Klensin <klensin at jck.com> wrote:
> --On Saturday, 08 December, 2007 12:06 +0800 YAO Jiankang
> <yaojk at cnnic.cn> wrote:
> >> Without that mapping, the string cannot be parsed into labels
> >> since conventional (legacy) FQDN parsers separate labels
> >> _only_ on ASCII period, 0x2E, aka U+002E.
> > true. non IDNA-aware software can not parse IDN.
> >> Not being able to parse the string into labels would result in
> >> rather serious lookup failures, but the problem is even worse
> >> because:
> > if I understand it correctly,
> > it seems that you have the following assumption:
> > The domain name with the dot of (ideographic full
> > stop), U+FF0E (fullwidth full stop), or U+FF61 (halfwidth
> > ideographic full stop) is not IDN. so this domain will be
> > sent to DNS lookup server without IDNA process. actually,
> > according to RFC3490, it is IDN.
> > Since it is IDN, it must be dealt with IDNA before being sent
> > to DNS lookup. if that happens, there have not the problem as
> > you said.
> That is not my assumption. Perhaps I can explain this better by
> means of an example. I can't do this exactly, so suppose that
> the character "?" is actually U+3002 (ideographic full stop).
> Someone sends me a URL in email. The URL consists of
> where the A-label corresponds to the U-label φοο.
> That example uses standard dots. Suppose I do not have an
> IDNA-aware browser. But I can take the string from your mail,
> paste it in, parse it into
> "www", "xn--0xaat", "example", and "com",
> look things up, and obtain the page. That is how IDNA is
> supposed to work. As long as the user sticks to passing the
> ACE form around, applications do not need to be IDNA-aware.
> However, assume that you send me a URL, that looks (substituting
> "?" as above) like:
> I copy that out and paste it into my browser, which we are still
> assuming is not IDNA-aware. Because the browser is not
> IDNA-aware, the domain name is parsed into
> "www?xn--0xaat", "example" and "com"
> This is obviously wrong and will obviously result in a failure
> to find the name in a query. Worse, that parsing is performed
> in places and with software other than DNS resolvers. For
> example, there are several security-related protocols that use
> DNS names as identifiers but keep them in internal DNS form (a
> list of labels stored with lengths and values, not separated by
> dots). Depending on how they are designed, even modern
> implementations are not required to be IDNA-aware (because IDNA
> is transparent). But the dot-mappings cannot be transparent:
> every system, module, or application that has to parse an FQDN
> into components must know what is, and is not, a
> label-separation character.
> Idna-update mailing list
> Idna-update at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update