exact match vs mapping

John C Klensin klensin at jck.com
Tue Mar 31 18:07:43 CEST 2009



--On Monday, March 30, 2009 22:16 -0700 Erik van der Poel
<erikv at google.com> wrote:

>...
>> A variation, which I'm finding increasingly interesting for
>> other reasons, is to consider mapping part of the IRI/URI
>> boundary, thereby permitting it to be different for different
>> protocols if that is useful (and it may be).
>
> Until recently, I had been thinking that having different
mappings for
> different protocols (e.g. HTTP, email, etc) could lead to
> incompatibilities between the domain names used in the
different
> protocol stacks, somewhat akin to the "balkanization" problem
that
> people have mentioned whenever someone appears to want to do
something
> differently for a single language (such as a European
Latin-based
> language).

The main problem with trying to do things that are linked to
languages is that we have no deterministic way to identify which
language is relevant at lookup time (given how short domain name
labels are and that many of them are not "words", even the
common language-determining heuristics are unlikely to work at
all well).  With URIs and IRIs, we have very precise and exact
protocol identifiers (until someone tries to internationalize
those, see below).

>...
> I may have misunderstood your phrase "permitting it to be
> different for different protocols". Did you mean, permitting
> the mapping to be different for different protocols? Or
> permitting some protocols to include mapping, and others not?

I was deliberately vague because it is either reasonable to push
things out to that level or it is not.  If it is not, it is
unwise to spend energy on that discussion, especially since your
second option is a proper subset of the first.

However, note again that the strongest and most-repeated
argument for doing extensive, IDNA2003-like, mappings is
compatibility in web applications, including some violations of
existing standards.  If that is really our main reason and we
wanted to encourage continuation of those non-conforming
implementations, then it would be sensible to support extensive
mappings for http and https.  Other protocols would either get
no mappings or only those mappings that are narrowly necessary
to preserve user expectations.  I assume the latter would be
done by inclusion so as to pick up some sort of case-matching
and Asian variable-width characters but little else (i.e.,
wholesale mapping of compatibility characters appears to be to
be inconsistent with the "inclusion" model we are presumably
still operating under).

    john



More information about the Idna-update mailing list