Unicode position on local mapping

Cary Karp ck at nic.museum
Wed Feb 18 12:38:46 CET 2009


> B) Registry local mapping.
> 
> You request a registration of "å.com <http://xn--5ca.com>" and the
> registry gives you both "å.com <http://xn--5ca.com>" and
> "aa.com" (bundling).

I don't think we've even begun to appreciate the significance of
bundling, and the extent to which it is going to pervade the
internationalized name space. We are, however, in comfortable agreement
about the necessity for the name holder to know exactly what A-label
they have registered. The dialog in which that communication is
embedded can, however, be complex and vary significantly from registry
to registry. This is, and should be, out of scope for the IDNA protocol.

Whatever you might say about that, I hope we do also agree on the need
for a name holder to be unequivocally aware of the U-label to which a
registered A-label decodes. Instances of confounded registrant
expectation on that point are already legion, and there is reason
enough to expect such confusion to be propagated into the discussion
of A-labeled TLDs. I have a rough time seeing how protocol-level
remapping can do anything to clear those murky waters.

> The draft is silent on this aspect.

> For (B), I'm not sure what you think, but for me is clearly a case
> where local mappings make sense, and are being currently implemented
> (under a different name, bundling). So, for example, simplified
> Chinese characters in a label can be mapped to traditional, and both
> labels registered. Labels only differing by eszett and "ss" can be
> bundled (or blocked). That's up to the registry.

Many registries go to great lengths to provide such service
transparently to name holders. And, here again, there is extensive
local variation. May I take it that you regard it as reasonable for the
draft not to comment on this? (Keep in mind that ICANN's IDN
Guidelines are intended to address details that require registry
understanding, and which don't fit easily into any other vehicle. Note
also, that the intention has long been for those guidelines to be
reframed as a BCP via the IETF, as soon as possible. In fact, the
development of the Guidelines has been on hold for quite a while,
waiting for the protocol revision to be completed.)

> C) Client local mapping.
> 
> The web page you're looking at contains <a href="Å.com">, and your
> browser sends you to "aa.com" instead of "å.com
> <http://xn--5ca.com>", behind your back.
> 
> In the above messages** I'm using the characters:
> U+00C5 <http://unicode.org/cldr/utility/character.jsp?a=00C5> ( Å )
> LATIN CAPITAL LETTER A WITH RING ABOVE
> U+00E5 <http://unicode.org/cldr/utility/character.jsp?a=00E5> ( å )
> LATIN SMALL LETTER A WITH RING ABOVE.
> 
> They are allowed to do so by:
> 
> http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-5.3

> For (C), this is the area that the UTC position is actually
> addressing; where some client program implementing IDNA is doing a
> remapping. It is all in reference to (C) that my message to Cary is
> written.

And there was Cary, all wound up about every aspect of this _except_
for C :-)

> Å is a better example anyway, because of the equivalence in some
> languages "aa" in some languages.

Å is the 27th letter in the Swedish alphabet and, once upon a time, was
equivalent to "aa". However, the current orthographic rule is to remove
the diacritical mark from a character in any context that does not
accept the marked form, rather than replace the character with some
other form, even if there is traditional warrant for it.

The latest edition of "Svenska skrivregler" (2008) specifically cites
"Web addresses" in illustration of a context where decorated characters
can be used, and e-mail addresses as one where they cannot. So for the
purposes of the present discussion, "å" is also be equivalent to "a".
This would, of course, be business as usual in the pre-IDN world, but
in a name space where "öresundsbro" and "øresundsbro" have to be
identical, the conditional (and potentially bundleable) relationships
between "å", "a", and "aa" illustrate this new class of headache that
simply is not amenable to solution by global dictate.

/Cary


More information about the Idna-update mailing list