Standards and localization (was Dot-mapping)
John C Klensin
klensin at jck.com
Sat Dec 8 19:39:16 CET 2007
--On Saturday, 08 December, 2007 23:40 +0800 YAO Jiankang
<yaojk at cnnic.cn> wrote:
> Dear John,
> Thans a lot for your good example.
> At first, we must be sure that the domain
> "xn--0xaat?example.com" is a IDN since it includes the
> non-ascii character.
No, that is irrelevant. The problem with dots is that
applications that are _not_ IDNA-aware, much less
IDNA-conformant, must understand exactly what the label
separators to parse a domain name into labels. There is _no_
problem for IDNA2003-conformant software, because it presumably
knows what to do (including figuring out whether a string that
starts in "xn--" is, or is not, a valid IDN.
> at second ,
> RFC3490 said " Whenever a domain name is put into an
> IDN-unaware domain name slot (see section 2), it MUST
> contain only ASCII characters. " Your browser in your
> example is an IDN-unaware domain name slot. "I copy that
> out and paste it into my browser" is trying to put the IDN
> into an IDN-unaware domain name slot. This is not
> allowed by RFC3490.
That is correct. But remember that 3490 is very specific that
applications and DNS interfaces are _not_ required to conform to
it. Someone creating, registering, or displaying an
IDNA-conformant FQDN or label has no way to know whether or not
the name string will be accessed and processed exclusively by
3490-conformant software. If an application doesn't know
anything at all about 3490, then it still must be able to parse
any domain name it receives into labels. And use of anything
but ASCII dot (full stop) prevents such applications from doing
> So you can not do so by puting IDN into an IDN-unaware
> domain name slot. It violates the requirment of RFC3490.
Applications that conform (only) to RFC 1034/ 1035/ 1123 are not
required to conform to 3490.
> if you put IDN into an IDN-unaware domain name slot, I am
> sure that it will cause some problems.
Of course. But this isn't about name slots. Contrast the
following problems, assuming a domain name placed in running
text and thinking _only_ about applications and resolvers that
do not support IDNA:
Applications that are not-IDNA aware will pick this up,
parse it correctly into labels, look things up in the
DNS, and find them if they are there. This is how IDNA
is expected to work, so no problem.
Some applications prohibit this as part of their own
syntax, but the application is not part of either the
DNS or IDNA. In the general case and absent such
prohibitions, they are required to parse this into
labels (which they have no trouble doing) and look those
labels up in the DNS. That is a requirement under RFC
1034 and RFC 8121 is, IMO, _very_ explicit about that.
Of course, if the domain is properly run, they won't
find anything, but that is ok.
(Where ? is an alternate dot). Now, IDNA2003-aware
applications will probably work exactly the same way
applications that are not IDNA2003-aware will work in
the first case above. But applications that are not
IDNA-aware will be unable to parse the string into
labels as you presumably intended. Probably the lookup
will fail, but, since a number of protocols and
processing models assume that FQDNs can be converted
from external (dot-separated labels) to internal
(lengths and strings) form by any system or back without
loss of information, there are _far_ more opportunities
for damage because information is lost from the label
structure itself, not just from the interpretation of
In addition, if the ACE strings in the first example and the ACE
strings in the third one are identical, the parsing and lookup
in the first example will succeed and the second one will
presumably fail. This also gives us two FQDNs, both in ACE
form, that IDNA considers equivalent and the 1034 DNS does not.
That is bad news. And, again, while it may provide some
additional spoofing opportunities, spoofing has nothing to do
with the problem.
More information about the Idna-update