Unicode position on local mapping

Mark Davis mark at macchiato.com
Wed Feb 18 02:14:32 CET 2009


I think at least some of the problems we are having are in terms of
communication. There are at least 3 cases that are important to separate:

A) Registration local mapping.

You request a registration of "Å.com", and the registry converts that to "
aa.com" behind your back. They are allowed to do so by:

http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-4.2

B) Registry local mapping.

You request a registration of "å.com <http://xn--5ca.com>" and the registry
gives you both "å.com <http://xn--5ca.com>" and "aa.com" (bundling).

The draft is silent on this aspect.

C) Client local mapping.

The web page you're looking at contains <a href="Å.com">, and your browser
sends you to "aa.com" instead of "å.com <http://xn--5ca.com>", behind your
back.

In the above messages** I'm using the characters:
U+00C5 <http://unicode.org/cldr/utility/character.jsp?a=00C5> ( Å ) LATIN
CAPITAL LETTER A WITH RING ABOVE
U+00E5 <http://unicode.org/cldr/utility/character.jsp?a=00E5> ( å ) LATIN
SMALL LETTER A WITH RING ABOVE.

They are allowed to do so by:

http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-5.3

===

For (A), I think we are in agreement; this is a bad idea. The registrant
should supply exactly what they want, and if it is not an A-Label, it should
be rejected, so that as a result the user only is able to register what s/he
thinks is being registered.

For (B), I'm not sure what you think, but for me is clearly a case where
local mappings make sense, and are being currently implemented (under a
different name, bundling). So, for example, simplified Chinese characters in
a label can be mapped to traditional, and both labels registered. Labels
only differing by eszett and "ss" can be bundled (or blocked). That's up to
the registry.

For (C), this is the area that the UTC position is actually addressing;
where some client program implementing IDNA is doing a remapping. It is all
in reference to (C) that my message to Cary is written. And it is here where
the interoperability and security problems of local mappings surface.

At least in the current draft, we are not at a "no mappings" model - we are
at an "arbitrary conflicting mappings" model, because we allow local
mappings for (C).

And the thought of the thousands of user agents; browsers, IMers, emailers,
plus search engines, (plus varying by versions!) sending <a href="Å.com"> to
"aa.com" instead of "å.com <http://xn--5ca.com>" is a nightmare. Before
allowing that nightmare, we really need to hear a compelling case for it!

Mark

** I included the spelled-out characters that because you misread my message
above. You said "I find your examples of someone mapping "a" with an acute
accent into one with a grave accent unpersuasive, partially because that is
prohibited by the current text (because both are PVALID characters) ".

What I actually wrote was "Á should not map to à and not á". The character
being mapped in my message is a
U+00C1<http://unicode.org/cldr/utility/character.jsp?a=00C1>( Á )
LATIN CAPITAL LETTER A WITH ACUTE, which is *not* PVALID. I'm guessing
your emailer mangled it. But Å is a better example anyway, because of the
equivalence in some languages "aa" in some languages.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090217/04c9091a/attachment.htm 


More information about the Idna-update mailing list