Q2: What mapping function should be used in a revised IDNA2008specification?

Erik van der Poel erikv at google.com
Wed Apr 8 21:44:54 CEST 2009


On Wed, Apr 8, 2009 at 12:17 PM, John C Klensin <klensin at jck.com> wrote:
> --On Wednesday, April 08, 2009 07:07 -0700 Erik van der Poel
> <erikv at google.com> wrote:
>> The Eszett issue is not just a matter of registries putting in
>> a few months worth of work. We would also need to convince
>> client implementers to stop mapping Eszett to ss. Although one
>> could argue that the number of labels containing Eszett in
>> stored files and exchanged text is low (compared to the number
>> of words in plain text that contain Eszett), we now have quite
>> a lot of installed clients that perform IDNA2003 mappings out
>> there. I myself am only involved in a client (Google crawler)
>> that holds little sway in this matter. There are a number of
>> other clients that hold much more sway, and I am frustrated
>> about the lack of consensus among those implementers and other
>> members of this WG.
>
> Erik,
>
> Once again, this is really not much different from transition
> issues that registries deal with regularly, especially every
> time a new permitted character is introduced.  Certainly anyone
> wanting to register a label containing Eszett should be advised
> that getting the one containing "ss" instead, and holding onto
> it for a while, would be a good idea.   Certainly someone
> obtaining a label containing Eszett when one containing "ss"
> instead already exists should be aware of the risks and issues
> associated with doing so (if the registry hasn't used variant
> techniques to prevent that situation).  The client software
> packages won't be converted all at once and, even if they were,
> users would not install the new versions all at once.
>
> But once again, this has been discussed innumerable times and, I
> believe, fairly firmly decided against (again) in San Francisco:
> if one wants 100% compatibility with IDNA2003, with no changes
> in characters or practices, then one needs to stay with IDNA2003
> because _any_ move to get to a Unicode 5.0 or 5.1 base is going
> to introduce changes in practices and would so even if the WG
> agreed to an "IDNA2003 mappings forever" model.

I don't think anybody is arguing that we should stay with IDNA2003 and
not upgrade IDNA to Unicode 5.1 (and beyond). The discussion is about
what to do about the problematic characters. Yes, there are merely
four of these characters when you consider mapping issues (Geresh, etc
are not related to mapping), so if the WG decides not to map these
characters, that's fine. We shall see what the implementations
themselves end up doing. The last time Vint asked for consensus around
these four characters was before the lookup mapping consensus was
declared. It might be a good idea to ask for consensus about when and
where the lookup mapping occurs (i.e. before any lookup, or after
IDNA2008 lookup). It might also be a good idea to ask for consensus
about the precise mappings, including the NFKC tags (<wide> and
<narrow>, etc) and the problematic four.

>> Another example of a client developer that is inhibiting
>> progress is Firefox, with its refusal to display Unicode
>> labels under certain large TLDs such as .com. We really need
>> to resolve this issue too, otherwise IDNA is not really going
>> to spread.
>
> We can't "decide" that and, if we did, they would ignore us.
> Remember that IDNA2003 _requires_ the display of native
> character forms (not even necessarily canonical/U-label ones)
> unless the application knows that is not possible.  That puts
> both IE7 and Firefox (and I think Opera and Safari, but have not
> checked either recently) out of conformance.   Their perception
> (at least for some of them) is that these measures are needed to
> protect their users from bad behavior and that doing so is their
> first responsibility, regardless of what the standards say, and
> that they are therefore going to push back on non-conformance
> with both the standards and what they consider bad behavior.  I
> can't say that I blame them
>
> And, fwiw, one of the hopes for getting rid of symbols,
> punctuation, and mapping in IDNA2008 is that it would, in the
> long term, reduce some of the concerns of those client vendors
> and permit more consistent user-facing behavior.   I don't know
> if it would or would not, but, e.g., Mark's opinion that those
> fears are largely unreasonable or yours that the browsers should
> all be consistent are worth virtually nothing if they are doing
> things they believe are necessary to protect their users.
>
>> Both of these issues have been discussed to death, and yet the
>> client implementers are not convinced. Sorry if I come across
>> as pessimistic.
>
> About all we can do is to produce a specification that is,
> itself, as predictable as possible (and do it sooner rather than
> later).   If one wants that specification to be as predictable
> as possible for registrants, web page authors, and indexing
> systems, then one wants to, e.g., put a lot of emphasis on
> IDNA2003 compatibility.  If one wants it to be as predictable as
> possible for users who are seeing the network via particular
> implementations and versions of particular clients, then one
> wants to focus on simplification of the character relationships
> (probably including minimal or no mapping, leaving that as an
> implementation/local issue with immediate conversion to
> canonical form) and exclusion of anything that could be remotely
> problematic.  Of course, we have to find a balance there, and
> the devil is in the details, but, if one wants consistent
> browser behavior than one has to have a standard that they feel
> eliminates the need for them to make their own
> individual/separate rules to protect users.

... and to protect registrants, and to protect the client developers
themselves (from excessive bug reports, tech support calls, etc).

Erik


More information about the Idna-update mailing list