Unicode position on local mapping

Eric Brunner-Williams ebw at abenaki.wabanaki.net
Wed Feb 18 22:09:12 CET 2009


Comments interlinear.

Mark Davis wrote:
> I think at least some of the problems we are having are in terms of 
> communication. There are at least 3 cases that are important to separate:
>
> A) Registration local mapping.
>
> You request a registration of "Å.com", and the registry converts that 
> to "aa.com <http://aa.com>" behind your back. They are allowed to do 
> so by:
>
> http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-4.2

I'm sorry but I understand 4.2 to be referring to "local implementation 
choice", not to registry actions, whether bugs there, or policy there. 
Unless you're discussing a policy of a registry (or registrar for that 
matter) or a bug in the registry (or registrar for that matter) system, 
which amounts to policy anyway.

> B) Registry local mapping.
>
> You request a registration of "å.com <http://xn--5ca.com>" and the 
> registry gives you both "å.com <http://xn--5ca.com>" and "aa.com 
> <http://aa.com>" (bundling).
>
> The draft is silent on this aspect.

Back in the '03 work I discussed the Abenaki equivalence class of {8, w, 
ou, and U+0222, U+0223}, and of course, the CDNC/JET proposed an SC/TC 
equivalence class, but an overlooked subtlety of that proposal was the 
proposed inter-registry cooperation over the set of registries then 
offering SC or TC.

The industry seems to have invented "bundling" as a generic 
non-description of what might be a persistent zone-local mapping, or 
temporary marketing campaign by a registration channel (write access to 
the registry db), that is, a "registry service" or a "registrar 
service", in the ICANN gTLD registrar and registry nomenclatures.

Why should the draft have anything on either channel-scoped, or 
registry-scoped, or multi-registry-scoped equivalence classes, or the 
mechanism(s) used to implement this local scope?

Just to make something obvious, a registrar could "bundle" both 
"å.example <http://xn--5ca.com>" and "aa.example <http://aa.com>", not 
just a registry.

>
> C) Client local mapping.
>
> The web page you're looking at contains <a href="Å.com">, and your 
> browser sends you to "aa.com <http://aa.com>" instead of "å.com 
> <http://xn--5ca.com>", behind your back.
>
> In the above messages** I'm using the characters:
> |U+00C5 <http://unicode.org/cldr/utility/character.jsp?a=00C5>| ( Å ) 
> LATIN CAPITAL LETTER A WITH RING ABOVE
> |U+00E5 <http://unicode.org/cldr/utility/character.jsp?a=00E5>| ( å ) 
> LATIN SMALL LETTER A WITH RING ABOVE.
>
> They are allowed to do so by:
>
> http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-5.3

I'm sorry but I understand 5.3 to be referring to "preprocessing" (prior 
to what actual processing is undefined) or the "user interface", hence a 
local implementation choice, and what happens there seems to be way 
outside our control.
> ===
>
> For (A), I think we are in agreement; this is a bad idea. The 
> registrant should supply exactly what they want, and if it is not an 
> A-Label, it should be rejected, so that as a result the user only is 
> able to register what s/he thinks is being registered.

If the presentation to the registry (or the registrar for that matter) 
is aa.example, how is the registry (or the registrar for that matter) to 
divine the registrant's intent was Å.example?

My point is, regardless whether this is good or bad, how are we 
(registry and/or registrar hat on) to know that there is a bug between 
us and the would-be registrant's actual intent? And if it is a bug or 
the policy of the registry (or the registrar for that matter), how is 
this fundamentally different from a registry (or the registrar for that 
matter) that declines all IDN registration offers, or transforms such 
offers?

For instance, in the 2002 time-frame the MicroSoft browser product had a 
bug in its IDN code that caused every browser in the CDNC market to 
source a sequence of packets to Redmond, then Mountain View and finally 
Reston, resulting in noticeable dollar-denominated overseas tariffs for 
the CNNIC ISP market. The resolution was systematically incorrect due to 
"(product) local mapping" (a bug in handling the final octet of a 
string), and the resolvant (party attempting to resolve an IDN) was 
unable to supply exactly what they wanted.

> For (B), I'm not sure what you think, but for me is clearly a case 
> where local mappings make sense, and are being currently implemented 
> (under a different name, bundling). So, for example, simplified 
> Chinese characters in a label can be mapped to traditional, and both 
> labels registered. Labels only differing by eszett and "ss" can be 
> bundled (or blocked). That's up to the registry.

See above. I favor this, always have.

> For (C), this is the area that the UTC position is actually 
> addressing; where some client program implementing IDNA is doing a 
> remapping. It is all in reference to (C) that my message to Cary is 
> written. And it is here where the interoperability and security 
> problems of local mappings surface.
>
> At least in the current draft, we are not at a "no mappings" model - 
> we are at an "arbitrary conflicting mappings" model, because we allow 
> local mappings for (C).
>
> And the thought of the thousands of user agents; browsers, IMers, 
> emailers, plus search engines, (plus varying by versions!) sending <a 
> href="Å.com"> to "aa.com <http://aa.com>" instead of "å.com 
> <http://xn--5ca.com>" is a nightmare. Before allowing that nightmare, 
> we really need to hear a compelling case for it!

The example set of potentially incorrect implementations cited here (see 
also MicroSoft's browser product, circa 2002, supra), is vast.

We can't correct those implementations.

Eric
>
> Mark
>
> ** I included the spelled-out characters that because you misread my 
> message above. You said "I find your examples of someone mapping "a" 
> with an acute accent into one with a grave accent unpersuasive, 
> partially because that is prohibited by the current text (because both 
> are PVALID characters) ".
>
> What I actually wrote was "Á should not map to à and not á". The 
> character being mapped in my message is a |U+00C1 
> <http://unicode.org/cldr/utility/character.jsp?a=00C1>| ( Á ) LATIN 
> CAPITAL LETTER A WITH ACUTE, which is *not* PVALID. I'm guessing your 
> emailer mangled it. But Å is a better example anyway, because of the 
> equivalence in some languages "aa" in some languages.
> ------------------------------------------------------------------------
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>   



More information about the Idna-update mailing list