mark.davis at icu-project.org
Thu Jul 24 19:38:22 CEST 2008
One thing that I hope we have a chance to discuss in Dublin is
IDNA2008 is actually much more lenient than IDNA2003, because it allows
arbitrary local mappings. Suppose you have any of the following in an email
message, for example.
6. http://Schäﬀer.de # using
<U+FB00<http://unicode.org/cldr/utility/character.jsp?a=FB00>( ﬀ )
LATIN SMALL LIGATURE FF>
LATIN SMALL LETTER E><
U+E006E <http://unicode.org/cldr/utility/character.jsp?a=E006E> TAG LATIN
(where <...> is a literal character)
An IDNA2008-conformant implementation could lowercase any of these using a
local mapping -- or not, in which case #2-8 would fail. It could remove the
illegal characters in #6 to #8, or not remove them and have the lookup fail.
It could map the ligature ff to ff, or not. It could even decide, for
example, based on locale linguistic mappings using the UI language of the
client, or the language of the email, or the default system language, that
it could map #3 to #2, #4 to #3 or vice versa, or #5 to #3.
On IDNA2003, in contrast, the mappings for all of these are completely
determinant (with all but the first being allowed, and the last being
disallowed). While implementations do do some prefiltering of certain format
characters in some cases, except for that they tend to follow the rules.
I've become more concerned over time that throwing the doors open to
arbitrary mappings will end us up in an interoperability nightmare. See also
the rough draft I had some time ago at
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update