Eszett and IDNAv2 vs IDNA2008
Erik van der Poel
erikv at google.com
Sun Mar 15 05:28:49 CET 2009
I should have responded to this earlier. Sorry.
On Fri, Mar 13, 2009 at 6:43 AM, Andrew Sullivan <ajs at shinkuro.com> wrote:
> On Thu, Mar 12, 2009 at 09:49:11PM -0700, Erik van der Poel wrote:
>> [such names would] find their way into HTML files, where they would
>> cause interoperability problems, as I have explained so many times.
>> (MSIE7 does not let users click through links containing xn-- names
>> that cannot be the result of an IDNA2003 transformation.)
> What that really means, however, is that MSIE7 is stuck at IDNA2003,
> full stop.
To be fair, I should admit that a large number of MSIE7 users get
automatic updates from Microsoft.
If users' copies of MSIE were automatically updated so that users
could click through xn-- links containing Eszett or newly assigned
characters, I would probably be less concerned about this, though some
users may have unwisely turned off automatic updates. Other apps may
also have problems with such xn-- names, so I'm still uncomfortable.
> We have three choices, therefore:
> 1. Do nothing, and live with IDNA2003 forever. If we thought this
> was an option, we'd not have chartered the new work, I think.
I agree that this is not a real option.
> 2. Adopt a new prefix. This is hardly better than (1), because MSIE7
> users will now see the A-label form all the time. I don't really see
> why that's supposed to be better.
I'm not suggesting a new prefix for /all/ labels. I'm suggesting a new
prefix only for labels containing the problematic characters: Eszett,
Final Sigma, IDNA2003 "map to nothing" characters (including
ZWJ/ZWNJ), the Unicode 5.1 lower-case counterparts of characters that
only had an upper-case in Unicode 3.2, and the new normalized versions
of the five CJK characters that had their normalizations changed after
Also, the intention is to use these in CNAME/DNAME (but I'd like to
use a new prefix just in case they find their way into HTML files).
Note that xd-- labels would not be generated by clients that are
looking up Unicode labels. They would convert Unicode labels to xn--
labels before looking them up.
Also, if we decide that ZWJ/ZWNJ should not be mapped to nothing,
clients would check the contextual rules before converting to A-labels
and looking those up. Even in that case, xn-- labels in CNAME/DNAME
could be used to generate the preferred display name. (Or is there
currently an expectation that apps will typically hide the CNAME/DNAME
from the user?)
> 3. Decide that MSIE7 loses. I think this is the right answer.
I'm a bit surprised that you've come to this conclusion. I would like
to think that we are all in this business for the /users/. It is not
the user's fault that they are using MSIE (though it is their fault if
they turn off automatic updates).
> IDNA-using web sites who want to use IDNA2008 and support MSIE7 will
> basically either have to detect MSIE7 and present links differently to
That's a lot of work for the Web sites.
> detect MSIE7 and warn such users, "Your browser won't be able to
> follow some links on this page,"
Not very friendly.
> or just accept that MSIE7 users get a
> less good experience.
That's hard to accept.
> Users who are interested in a good IDNA2008
> experience will start using another browser,
Many ordinary users stick with one browser. Very few users are aware
of IDNA or browsers that have better support for IDNA.
> and sites who really want
> the IDNA2008 new features (and we keep being assured people _do_ want
> them) will warn their users not to use MSIE7.
Perhaps some sites will warn their users, but it's not very friendly.
> Note that it's exactly problems like MSIE7 deciding not to follow some
> links "for the user's own good" that makes me oppose the earlier
> language we had in the IDNA2008 drafts permitting more or less
> arbitrary client-side mapping: I don't think
> application-by-application decisions of this sort are safe, desirable,
> or even wise. Instead they're a recipe for mistakes, user confusion,
> and bad behaviour.
I have to agree that arbitrary mapping is really bad.
More information about the Idna-update