Eszett and IDNAv2 vs IDNA2008

Erik van der Poel erikv at google.com
Sun Mar 15 05:28:49 CET 2009


Andrew,

I should have responded to this earlier. Sorry.

On Fri, Mar 13, 2009 at 6:43 AM, Andrew Sullivan <ajs at shinkuro.com> wrote:
> On Thu, Mar 12, 2009 at 09:49:11PM -0700, Erik van der Poel wrote:
>> [such names would] find their way into HTML files, where they would
>> cause interoperability problems, as I have explained so many times.
>> (MSIE7 does not let users click through links containing xn-- names
>> that cannot be the result of an IDNA2003 transformation.)
>
> What that really means, however, is that MSIE7 is stuck at IDNA2003,
> full stop.

To be fair, I should admit that a large number of MSIE7 users get
automatic updates from Microsoft.

If users' copies of MSIE were automatically updated so that users
could click through xn-- links containing Eszett or newly assigned
characters, I would probably be less concerned about this, though some
users may have unwisely turned off automatic updates. Other apps may
also have problems with such xn-- names, so I'm still uncomfortable.

> We have three choices, therefore:
>
> 1.  Do nothing, and live with IDNA2003 forever.  If we thought this
> was an option, we'd not have chartered the new work, I think.

I agree that this is not a real option.

> 2.  Adopt a new prefix.  This is hardly better than (1), because MSIE7
> users will now see the A-label form all the time.  I don't really see
> why that's supposed to be better.

I'm not suggesting a new prefix for /all/ labels. I'm suggesting a new
prefix only for labels containing the problematic characters: Eszett,
Final Sigma, IDNA2003 "map to nothing" characters (including
ZWJ/ZWNJ), the Unicode 5.1 lower-case counterparts of characters that
only had an upper-case in Unicode 3.2, and the new normalized versions
of the five CJK characters that had their normalizations changed after
Unicode 3.2:

http://www.unicode.org/reports/tr46/#Differences_from_IDNA2003

Also, the intention is to use these in CNAME/DNAME (but I'd like to
use a new prefix just in case they find their way into HTML files).

Note that xd-- labels would not be generated by clients that are
looking up Unicode labels. They would convert Unicode labels to xn--
labels before looking them up.

Also, if we decide that ZWJ/ZWNJ should not be mapped to nothing,
clients would check the contextual rules before converting to A-labels
and looking those up. Even in that case, xn-- labels in CNAME/DNAME
could be used to generate the preferred display name. (Or is there
currently an expectation that apps will typically hide the CNAME/DNAME
from the user?)

> 3.  Decide that MSIE7 loses.  I think this is the right answer.

I'm a bit surprised that you've come to this conclusion. I would like
to think that we are all in this business for the /users/. It is not
the user's fault that they are using MSIE (though it is their fault if
they turn off automatic updates).

> IDNA-using web sites who want to use IDNA2008 and support MSIE7 will
> basically either have to detect MSIE7 and present links differently to
> them,

That's a lot of work for the Web sites.

> detect MSIE7 and warn such users, "Your browser won't be able to
> follow some links on this page,"

Not very friendly.

> or just accept that MSIE7 users get a
> less good experience.

That's hard to accept.

> Users who are interested in a good IDNA2008
> experience will start using another browser,

Many ordinary users stick with one browser. Very few users are aware
of IDNA or browsers that have better support for IDNA.

> and sites who really want
> the IDNA2008 new features (and we keep being assured people _do_ want
> them) will warn their users not to use MSIE7.

Perhaps some sites will warn their users, but it's not very friendly.

> Note that it's exactly problems like MSIE7 deciding not to follow some
> links "for the user's own good" that makes me oppose the earlier
> language we had in the IDNA2008 drafts permitting more or less
> arbitrary client-side mapping: I don't think
> application-by-application decisions of this sort are safe, desirable,
> or even wise.  Instead they're a recipe for mistakes, user confusion,
> and bad behaviour.

I have to agree that arbitrary mapping is really bad.

Erik


More information about the Idna-update mailing list