idna-mapping update

Mon Dec 21 22:09:02 CET 2009

> From: Eric Brunner-Williams <ebw at abenaki.wabanaki.net>
> Michael,
>
> With respect to the example code points, none of these should be
> present in identifiers, as the base claim is that the construction of
> identifiers is harmed by their absence, and subscripts, superscripts,
> ligatures are not required to form identifiers, hence labels.
>
> Absent an example, and those you provided don't rise to the level of
> examples of characters necessary for the formation of identifiers,
> hence labels, the motivation for the rest of your note appears to me
> to reduce to a authority issue.

Eric (and others)
In fact, the actual list of differences is 3807 characters long, as much
as we can fully categorize exactly the current idna-mappings behavior
and compare it with an IDNA2003 compatible mapping. I only showed in my
previous message few characters at the beginning of the long list. It is
true that the case for none of them is earth shattering as they are
mostly 'compatibility' characters, but 3807 differences start to hurt
when you have already implemented IDNA2003 and want to move to IDNA2008.
Individually, a good case can be made to question the individual mapping
as done especially by John and Martin, but 3800+ is a lot to consider.
And yes to answer Martin, there is a 5.2 version in
http://www.unicode.org/Public/idna/5.2.0/IdnaMappingTable.txt 

To clarify, I was not questioning the 'no-mapping' choice of the WG for
protocol elements, with the caveat that this must be applied to the
post-NFC phase. I am in agreement that focusing on the A-label is a good
feature of IDNA-2008. I was also not questioning the choice made for
eszet and final sigma in the recent weeks. If you read carefully my
proposal I was proposing to leave those characters unmapped. And no
PVALID characters are touched by that process.

I am more concerned about various pre-mapping that will need to be
considered when implementing IDNA2008 and get consensus among various
implementers. 

Unlike John, I am not convinced that this WG and the Unicode UTC are
working on fundamental different assumptions, and maybe some work should
be done on TR46 to clarify that. The major difference I see is that the
UTC is much more concerned about migrations issues from IDNA2003 as
libraries supporting it will still be deployed for a very long time. I
don't think anyone wants compatibility characters to become PVALID, but
we still need to pre-process them consistently.

My concern was not about the core normative part of IDNA2008, but
instead making sure that the optional part (idna-mappings) is subject to
an healthy debate when it becomes input for fora interested in
migration, either that be in ICANN, IETF, UTC, or other entities.

Michel