mappings-01 and the general procedure

John C Klensin klensin at jck.com
Sun Jul 12 18:48:26 CEST 2009



--On Sunday, July 12, 2009 09:37 -0700 Erik van der Poel
<erikv at google.com> wrote:

> In the mappings-01 draft, the "general procedure" is:
> 
>    1.  All characters are mapped using Unicode Normalization
> Form C        (NFC).
> 
>    2.  Upper case characters are mapped to their lower case
> equivalents        by using the algorithm for mapping Unicode
> characters.
> 
>    3.  Full-width and half-width characters (those defined with
>        Decomposition Types <wide> and <narrow>) are mapped to
> their        decomposition mappings as shown in the Unicode
> character        database.
> 
> Although mappings-01 clearly states that "an appliction[sp]
> might want to implement" mappings that are more compatible
> with IDNA2003 instead, I wonder whether implementors will
> figure out that the order of the above steps is somewhat
> different from that of IDNA2003, and that some strings would
> be mapped differently.
> 
> For example, let's take the following input string:
> 
> U+FF45 FULLWIDTH LATIN SMALL LETTER E
> U+0301 COMBINING ACUTE ACCENT
> 
> The mappings-01 procedure would map this string to the
> following:
> 
> U+0065 LATIN SMALL LETTER E
> U+0301 COMBINING ACUTE ACCENT
> 
> On the other hand, IDNA2003 would map it to:
> 
> U+00E9 LATIN SMALL LETTER E WITH ACUTE
> 
> This is because mappings-01 has NFC as the first step rather
> than the last.

And that, at least read that way, would violate the provision in
Protocol that strings passed into the procedure be in NFC form.
In other words, NFC needs to be applied after the steps of
mapping-01.  that application would result in transforming
U+0065 U+0301 into U+00E9, so the result is the same.

It probably would not be a bad idea for mapping-02 to note that
Protocol effectively requires another application of NFC...
and/or I can say that in Protocol if I get enough guidance in
the next few hours about what to say there about mapping at all.

In any event, I think this is, at worst, a minor bug or
inconsistency in the text, not a significant incompatibility.

    john

p.s. I'm on travel tomorrow afternoon and evening, so the cutoff
for any new version of Protocol (or Defs) is effectively this
afternoon, not mid-day tomorrow.  If people want such versions,
I need to know what to say.



More information about the Idna-update mailing list