Standards and localization (was Dot-mapping)

John C Klensin klensin at jck.com
Sat Dec 8 01:21:53 CET 2007



--On Saturday, 08 December, 2007 05:33 +0900 Yangwoo Ko
<newcat at icu.ac.kr> wrote:
 
> By removing dot-mappings, handling of dot-like characters is
> now left to   developers' discretion. Developers are
> encouraged to apply as much local context as possible when
> encountered dot-like characters. In several local
> environments, which I am familiar with, decision on which are
> (and are not) dots is quite evident. Thus, as a user, I don't
> care too much about this. But, ...
> 
> Still, I have a least two concerns.
> 
> (1) It is not clear what is the right thing if a dot-like
> character is encounterd in a situation where local context is
> vague. I don't have a concreate exmaple of this situation, but
> it does not imply that it does not exist.

I would not be surprised to discover this situation.   It will
be less with more localization because, if the software is
sufficiently localized, any ambiguous dot-like character almost
certainly should not be mapped.   The problem is more likely to
arise if something is localized less or to a more general
context and probably converges on your second case.

> (2) Even in a very clear local context, there can exist
> multiple (and hence incompatible) practices. (I think we have
> many examples for this.) In such a situation, it may take
> quite long time to converge on a consensus and user
> experiences are not that good.

Yes.

But I think everyone is missing something in this discussion,
something that I didn't start to realize until a few days ago.
The idea of mapping dots is a fundamental bug in the IDNA spec.
It violates the fundamental principle of IDNA, i.e., that IDNs,
in ACE form, work transparently with legacy DNS servers,
resolvers, cachesl, tunnels, etc.  

Suppose a DNS processor or interface that is not IDNA-aware
encounters one of the three "recognize as dot" characters
specified in Section 3.1 of RFC 3490 (U+3002 (ideographic full
stop), U+FF0E (fullwidth full stop), U+FF61 (halfwidth
ideographic full stop).  Because that legacy processor is, by
definition, not IDNA-aware or IDNA-conformant, it sees that
characters as one or more ordinary octets with undefined
semantics.   (See RFC 2181 for a discussion of this and note
that the DNS does not enforce the LDH rule.) Certainly it cannot
map them into U+002E (full stop) or recognize them as equivalent
to U+002E because it doesn't know anything about IDNA.    

Without that mapping, the string cannot be parsed into labels
since conventional (legacy) FQDN parsers separate labels _only_
on ASCII period, 0x2E, aka U+002E.

Not being able to parse the string into labels would result in
rather serious lookup failures,  but the problem is even worse
because:

	(1) If anyone is using short-form domain names (e.g.,
	names that are completed with additional components in a
	full-service resolver or surrogate for it), this will
	almost certainly cause problems and very confusing
	errors if that resolver is not IDNA-aware.  Because IDNA
	is supposed to occur entirely on the client, there has
	been no requirement that such servers be IDNA-aware.
	
	(2) There is also a security problem.  By mixing actual
	dots (U+002E) with one of the matching-character dots in
	a putative FQDN, an attacker (phishers included, but not
	the only group) can create a string that identifies a
	different DNS leaf node depending on whether the
	resolver with which it is used in IDNA-aware or not.

These problems are not just theoretical and can be fairly easily
demonstrated.  I believe that their implications are such that
the provisions of Section 3.1(1) of IDNA (RFC3490) MUST be
removed.  

Simply removing them would leave those who need these
dot-variations in an odd position: they can either move to
IDNA200X or some other update that permits handing the dots in
the user interface and provides a framework for doing so, or be
left without the other dots.   Put differently, there is no way
to interoperably conform to both that provision of RFC 3490 and
conform to RFC 1034/ 1035.   That is a fairly bad state to be in.

> If these two concers are real (though I hope not so), removed
> dot-mappings might be resurrected in a separate guideline
> document.

Indeed.  I think that, in general, guideline documents that
accumulate and report on successful practices and risks are
going to be very useful in this area.

thanks and regards,
    john



More information about the Idna-update mailing list