Unicode position on local mapping

John C Klensin klensin at jck.com
Tue Feb 17 18:40:38 CET 2009



--On Monday, February 16, 2009 11:20 +0100 JFC Morfin
<jefsey at jefsey.com> wrote:

> Mark,
> let say that the French ccTLD decides a position on
> "école.fr" and "ecole.fr" being the same domain name or not
> due to the case folding issue.

Mark (and others),

I'm on travel and will try to respond in my detail to your note
(and some related issues) as soon as I can, but, in the interim
let me just point out that the comment above is a specific
example of what drove us toward the "no mappings" model --
either what one sees is precisely what one gets or someone is
going to conclude that they need to make different decisions
(presumably by variants or similar techniques) in different
zones.    They need to be given the flexibility to do that or,
if they consider it important enough, they will do it anyway.
If they can figure out no other way, they will invent different
protocols and different DNS trees.  Some of those can be safely
dismissed as nut cases, but others, including some ccTLDs, pose
interoperability problems that are at least as bad, and maybe
far worse, than those you have contemplated.

That type of local decision issue, whether it is made by a
registry establishing variant rules or by national or regional
mandates on how software implementers are required to behave, is
also where the mapping issue comes together with some of the
more specific case-related ones such as Eszett.   If we say
"some Unicode code points are permitted, but, if they are, they
mean themselves and everything else is prohibited" we end up
with a clear and unambiguous situation that gives maximum
flexibility to registries/zones and, where absolutely necessary,
flexibility to implementations.

Independent of how it is written up, I find your examples of
someone mapping "a" with an acute accent into one with a grave
accent unpersuasive, partially because that is prohibited by the
current text (because both are PVALID characters) but more
because 
I think that is just unlikely in practice.   However, variations
on Jefsey's example are important (and, I predict, common) cases
in which registries are going to make their own decisions about
whether strings with accents should match strings without them.
They also illustrate the most difficult part of the problem:
even if we had server-side matching, we would still have
controversies about what should or should not match,
controversies that we would not be able to resolve globally.

I think we can make progress on this only by seeing it as a
collection of very complex tradeoffs, rather than by saying
"this is bad" without carefully examining all of the relevant
tradeoffs.  Too much mapping is bad, too little mapping is bad,
and the problem is in finding appropriate balances for the many
different contexts in which IDNs appear in the real world.  That
may require that we establish better explanations of what local
mappings are, and are not, appropriate, rather than trying to
define things so that they can be banned.  Or not.

As one example of that, I've recently concluded that there is no
excuse for any mapping (local or otherwise) on the registration
side.  On registration, it is important that the registrant be
sure of exactly what is being registered... and the only thing
that is actually registered, under IDNA2003 as well as IDNA2008,
is whatever come back from mapping the ACE form back to the
Unicode one.  The lookup situation may be different, and
probably is, but may also be different --in terms of the amount
and types of mappings and other flexibilities that are
permitted-- depending on the type of application involved.

regards,
     john
 







More information about the Idna-update mailing list