Interoperability

Mark Davis mark.davis at icu-project.org
Thu Jul 24 19:38:22 CEST 2008


One thing that I hope we have a chance to discuss in Dublin is
interoperability.

IDNA2008 is actually much more lenient than IDNA2003, because it allows
arbitrary local mappings. Suppose you have any of the following in an email
message, for example.

   1. http://schaffer.de
   2. http://Schaffer.de
   3. http://Schäffer.de
   4. http://Schaeffer.de
   5. http://Schæffer.de
   6. http://Schäffer.de   # using
<U+FB00<http://unicode.org/cldr/utility/character.jsp?a=FB00>( ff )
LATIN SMALL LIGATURE FF>
   7. http://Schäf<U+00AD<http://unicode.org/cldr/utility/character.jsp?a=00AD>SOFT
HYPHEN>
   fer.de
   8. http://<U+E0065<http://unicode.org/cldr/utility/character.jsp?a=E0065>TAG
LATIN SMALL LETTER E><
   U+E006E <http://unicode.org/cldr/utility/character.jsp?a=E006E> TAG LATIN
   SMALL LETTER
N>Schäffer<U+E007F<http://unicode.org/cldr/utility/character.jsp?a=E007F>CANCEL
TAG>.de

(where <...> is a literal character)

An IDNA2008-conformant implementation could lowercase any of these using a
local mapping -- or not, in which case #2-8 would fail. It could remove the
illegal characters in #6 to #8, or not remove them and have the lookup fail.
It could map the ligature ff to ff, or not. It could even decide, for
example, based on locale linguistic mappings using the UI language of the
client, or the language of the email, or the default system language, that
it could map #3 to #2, #4 to #3 or vice versa, or #5 to #3.

On IDNA2003, in contrast, the mappings for all of these are completely
determinant (with all but the first being allowed, and the last being
disallowed). While implementations do do some prefiltering of certain format
characters in some cases, except for that they tend to follow the rules.
I've become more concerned over time that throwing the doors open to
arbitrary mappings will end us up in an interoperability nightmare. See also
the rough draft I had some time ago at
http://docs.google.com/Doc?docid=dfqr8rd5_51c3nrskcx

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080724/975d973a/attachment.htm 


More information about the Idna-update mailing list