Parsing the issues and finding a middle ground -- another attempt

Mark Davis mark at macchiato.com
Tue Mar 3 20:47:47 CET 2009


We have had a lot of productive discussion lately. Here is my take on your
questions of 6 days ago, with "..." elisions so as to get at the core
questions.

(i) We ban registration-side mapping in the protocol and
> discourage any local mapping on that side.  ...


I agree.


> (ii) We make it clear (if it isn't already) that, in cases ...  in which
> perceived relationships among label strings are important, it is the
> responsibility of the relevant registry to cope ....


I'm not sure what this means. I'm guessing you mean policies like bundling
and blocking; if so, I agree.


> (iii) We tell folks on the lookup side that, if a label in
> native-character form is invalid under IDNA2008 but valid under
> IDNA2003, they SHOULD apply the IDNA2003 mappings and look the
> thing up.  Note that this implies two tests but only one lookup
> in the DNS. ...


If we made that a MUST, I'd be happy with it. If it is not a MUST, then we
can always have two kinds of implementations, which will inevitably cause
some interoperability problems.

Even somewhat better would be to have updated mappings a la TR46.  Some
figures:

   - There are about 5.5K characters added after Unicode
3.2<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B:age=5.1:%5D-%5B:age=3.2:%5D%5D>.
   (Note also that 5.2 is due out this fall, and will add more).
   - Of these 433 have NFKC+CaseFold
mappings<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B:age=5.1:%5D-%5B:age=3.2:%5D-%5B%5B:isLowercase:%5D-%5B:nfkcqc=n:%5D%5D%5D>
   .

While a number of these are archaic, some are not. It would be inconsistent
for a language using new and old characters for some characters be mapped
and others not. This would especially be the case for uppercases:
illustrating this with ASCII, for "Abc" to map to "abc", but for "Bcd" to
just fail.

However, bottom line, the main reasons for the mappings are
interoperability, so it is far, far important for us to maintain the 2003
mappings than to extend them to new characters.

(iv) For the four "changed interpretation" cases, we make it
> clear that the IDNA2008 interpretation is the important one and
> that registries have a lot of responsibility here.   However,
> if an application is in a position to deliver two different
> answers to the user, then it MAY reasonably do both lookups and
> then do whatever with them seems appropriate (obviously, a "did
> you really mean?" dialogue would be one such option).


Agreed as well. That, I think, is the only option I've heard for handling
for whatever characters end up in IDNA 2008 with changed interpretations
that would help mitigate the security problems.

The specified order of lookup will be important. The did you mean option
could be recommended for user-facing code. That isn't, of course, much use
for a lot of software like search engines, but for UIs could be useful.


>
> Does that help?
>
>   john
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090303/a7a995bb/attachment.htm 


More information about the Idna-update mailing list