Standards and localization (was Dot-mapping)

John C Klensin klensin at jck.com
Sat Dec 8 19:39:16 CET 2007



--On Saturday, 08 December, 2007 23:40 +0800 YAO Jiankang
<yaojk at cnnic.cn> wrote:

> Dear John,
>       
>      Thans a lot for your good example.
>      At first, we must be sure that the domain
> "xn--0xaat?example.com" is a IDN since it includes the
> non-ascii character.  

No, that is irrelevant.   The problem with dots is that
applications that are _not_ IDNA-aware, much less
IDNA-conformant, must understand exactly what the label
separators to parse a domain name into labels.     There is _no_
problem for IDNA2003-conformant  software, because it presumably
knows what to do (including figuring out whether a string that
starts in "xn--" is, or is not, a valid IDN.

>     at second ,
>   
>      RFC3490 said  " Whenever a domain name is put into an
> IDN-unaware domain name slot       (see section 2), it MUST
> contain only ASCII characters. "        Your browser in your
> example is an IDN-unaware domain name slot.       "I copy that
> out and paste it into my browser"  is trying to put the IDN
> into an IDN-unaware domain name slot.        This is not
> allowed by RFC3490.

That is correct.   But remember that 3490 is very specific that
applications and DNS interfaces are _not_ required to conform to
it.   Someone creating, registering, or displaying an
IDNA-conformant FQDN or label has no way to know whether or not
the name string will be accessed and processed exclusively by
3490-conformant software.  If an application doesn't know
anything at all about 3490, then it still must be able to parse
any domain name it receives into labels.   And use of anything
but ASCII dot (full stop) prevents such applications from doing
that.

>      So you can not do so by puting IDN into an IDN-unaware
> domain name slot.      It violates the requirment of RFC3490.

Applications that conform (only) to RFC 1034/ 1035/ 1123 are not
required to conform to 3490. 

>     if you put IDN into an IDN-unaware domain name slot, I am
> sure that it will cause some problems.

Of course.  But this isn't about name slots.  Contrast the
following problems, assuming a domain name placed in running
text and thinking _only_ about applications and resolvers that
do not support IDNA:

* ACE-string.ACE-string2.ASCII-string

	Applications that are not-IDNA aware will pick this up,
	parse it correctly into labels, look things up in the
	DNS, and find them if they are there.   This is how IDNA
	is expected to work, so no problem.

* non-ASCII-string.non-ASCII-string2.ASCII-string

	Some applications prohibit this as part of their own
	syntax, but the application is not part of either the
	DNS or IDNA.  In the general case and absent such
	prohibitions, they are required to parse this into
	labels (which they have no trouble doing) and look those
	labels up in the DNS.  That is a requirement under RFC
	1034 and RFC 8121 is, IMO, _very_ explicit about that.
	Of course, if the domain is properly run, they won't
	find anything, but that is ok.
	
* ACE-string?ACE-string2.ASCII-string

	(Where ? is an alternate dot).  Now, IDNA2003-aware
	applications will probably work exactly the same way
	applications that are not IDNA2003-aware will work in
	the first case above.  But applications  that are not
	IDNA-aware will be unable to parse the string into
	labels as you presumably intended.  Probably the lookup
	will fail, but, since a number of protocols and
	processing models assume that FQDNs can be converted
	from external (dot-separated labels) to internal
	(lengths and strings) form by any system or back without
	loss of information, there are _far_ more opportunities
	for damage because information is lost from the label
	structure itself, not just from the interpretation of
	individual labels.  

In addition, if the ACE strings in the first example and the ACE
strings in the third one are identical, the parsing and lookup
in the first example will succeed and the second one will
presumably fail.  This also gives us two FQDNs, both in ACE
form, that IDNA considers equivalent and the 1034 DNS does not.
That is bad news.  And, again, while it may provide some
additional spoofing opportunities, spoofing has nothing to do
with the problem.

thanks,
   john



More information about the Idna-update mailing list