Q2: What mapping function should be used in a revised IDNA2008specification?

John C Klensin klensin at jck.com
Wed Apr 8 21:17:57 CEST 2009

--On Wednesday, April 08, 2009 07:07 -0700 Erik van der Poel
<erikv at google.com> wrote:

> The Eszett issue is not just a matter of registries putting in
> a few months worth of work. We would also need to convince
> client implementers to stop mapping Eszett to ss. Although one
> could argue that the number of labels containing Eszett in
> stored files and exchanged text is low (compared to the number
> of words in plain text that contain Eszett), we now have quite
> a lot of installed clients that perform IDNA2003 mappings out
> there. I myself am only involved in a client (Google crawler)
> that holds little sway in this matter. There are a number of
> other clients that hold much more sway, and I am frustrated
> about the lack of consensus among those implementers and other
> members of this WG.


Once again, this is really not much different from transition
issues that registries deal with regularly, especially every
time a new permitted character is introduced.  Certainly anyone
wanting to register a label containing Eszett should be advised
that getting the one containing "ss" instead, and holding onto
it for a while, would be a good idea.   Certainly someone
obtaining a label containing Eszett when one containing "ss"
instead already exists should be aware of the risks and issues
associated with doing so (if the registry hasn't used variant
techniques to prevent that situation).  The client software
packages won't be converted all at once and, even if they were,
users would not install the new versions all at once.  

But once again, this has been discussed innumerable times and, I
believe, fairly firmly decided against (again) in San Francisco:
if one wants 100% compatibility with IDNA2003, with no changes
in characters or practices, then one needs to stay with IDNA2003
because _any_ move to get to a Unicode 5.0 or 5.1 base is going
to introduce changes in practices and would so even if the WG
agreed to an "IDNA2003 mappings forever" model.

> Another example of a client developer that is inhibiting
> progress is Firefox, with its refusal to display Unicode
> labels under certain large TLDs such as .com. We really need
> to resolve this issue too, otherwise IDNA is not really going
> to spread.

We can't "decide" that and, if we did, they would ignore us.
Remember that IDNA2003 _requires_ the display of native
character forms (not even necessarily canonical/U-label ones)
unless the application knows that is not possible.  That puts
both IE7 and Firefox (and I think Opera and Safari, but have not
checked either recently) out of conformance.   Their perception
(at least for some of them) is that these measures are needed to
protect their users from bad behavior and that doing so is their
first responsibility, regardless of what the standards say, and
that they are therefore going to push back on non-conformance
with both the standards and what they consider bad behavior.  I
can't say that I blame them

And, fwiw, one of the hopes for getting rid of symbols,
punctuation, and mapping in IDNA2008 is that it would, in the
long term, reduce some of the concerns of those client vendors
and permit more consistent user-facing behavior.   I don't know
if it would or would not, but, e.g., Mark's opinion that those
fears are largely unreasonable or yours that the browsers should
all be consistent are worth virtually nothing if they are doing
things they believe are necessary to protect their users.

> Both of these issues have been discussed to death, and yet the
> client implementers are not convinced. Sorry if I come across
> as pessimistic.

About all we can do is to produce a specification that is,
itself, as predictable as possible (and do it sooner rather than
later).   If one wants that specification to be as predictable
as possible for registrants, web page authors, and indexing
systems, then one wants to, e.g., put a lot of emphasis on
IDNA2003 compatibility.  If one wants it to be as predictable as
possible for users who are seeing the network via particular
implementations and versions of particular clients, then one
wants to focus on simplification of the character relationships
(probably including minimal or no mapping, leaving that as an
implementation/local issue with immediate conversion to
canonical form) and exclusion of anything that could be remotely
problematic.  Of course, we have to find a balance there, and
the devil is in the details, but, if one wants consistent
browser behavior than one has to have a standard that they feel
eliminates the need for them to make their own
individual/separate rules to protect users.

Just IMO, of course.

More information about the Idna-update mailing list