Mapping and Variants

Erik van der Poel erikv at google.com
Wed Mar 11 18:15:34 CET 2009


On Tue, Mar 10, 2009 at 2:30 PM, John C Klensin <klensin at jck.com> wrote:
> None of those positions are either "correct" or "incorrect".  I
> don't believe that looping around on them bring the WG much
> closer to a decision, either.  Let me try to make a list of
> possible positions, in no particular order...
>
>        (i) If one believes that getting rid of mapping, or at
>        least getting it far away from IDNA, is important, then
>        the answer is clear: Eszett should be a character, not
>        banned.
>
>        (ii) If one believes that backward-compatibility with
>        IDNA2003 is the most important criterion and should be
>        maintained going forward rather than being treated as a
>        transition issue, then all mapping should be retained,
>        including the Eszett-> "ss" mapping.
>
>        (iii)  If one believes that IDNA2003 compatibility is a
>        transition issue with a focus on reducing mapping as
>        time goes by, then Eszett should be a character although
>        various registries may find it advantageous to ban it at
>        the registry level, resulting in the fallback mappings
>        being applied to occurrences of it in lookup strings
>        and, to all practical intents and purposes the IDNA2003
>        behavior wrt that character.
>
>        (iv) For completeness, if one believes that Eszett
>        really isn't a character but should have been either
>        left out of Unicode or treated as a compatibility
>        character, not mapped via CaseFold, then the thing
>        should be banned if we don't map and mapped if we do.
>        While I don't believe anyone has taken that position, at
>        least in the last year or so, I think I know where to
>        find people who would take it.

I agree that none of these positions is "correct" or "incorrect", but
surely the point of the WG is to state positions and then weigh the
pros and cons of each approach. We can view these characters on a
scale from "easy to add, given the current situation" to "hard or
impossible to add, given the current situation".

For example, the Latin letter 'a' is very easy to support, because
it's already in DNS, used in email, the Web, and so on. The Han
character 中 is quite easy to support because it is already in IDNA and
does not have any mapping issues and so on. At the other end of the
scale, the French word aujourd'hui would be hard to support because
the single quote has special meanings in many different protocols and
contexts.

The characters Eszett, Final Sigma, ZWJ and ZWNJ fall somewhere
between "hard" and "easy". We need to weigh the communities'
requirements for these characters against the difficulty of
introducing them. From my point of view, we have plenty of
participation in this WG from people who state how difficult it is to
introduce these, but very little participation from those who can
state the actual needs of the various impacted communities. So there
may be a natural tendency of this WG to gravitate toward approaches
that do not require difficult transitions.

So my question is: Given that this may be our last chance to make
changes of this nature to the IDNA spec, is it OK for the IETF to be
biased against the communities' needs?

Erik


More information about the Idna-update mailing list