idna-mapping update

Sat Dec 19 04:13:49 CET 2009

> From Lisa Dusseault (Dec 1st)
> I don't believe we know what the WG consensus position is around how
> strongly pre-lookup mappings are recommended and in what use cases,
> and how compatible optional pre-lookup mappings are with IDNA2003
> in-protocol mapping.

I'd like to give a new feedback to that statement. The issue some of us have with the current recommendation in idna-mappings [draft-ietf-idnabis-mappings-05] is that it is vastly different from the mapping done in IDNA_2003, especially concerning compatibility mapping done beyond the narrow/wide mapping suggested in the current document. The solution proposes the referencing of a single mapping table, improving greatly odds that implementers will do the right thing. Finally, it makes trivial for the draft Unicode TR46 to refer to a common mapping definition, avoiding potential confusion and unnecessary duplication.

Some examples of characters mapped differently between idna-mappings and idna 2003 (in idna-mappings they stay unmapped):

	00AA ( ª ) => 0061 ( a ) # FEMININE ORDINAL INDICATOR
	00B2 ( ² ) => 0032 ( 2 ) # SUPERSCRIPT TWO
	00B3 ( ³ ) => 0033 ( 3 ) # SUPERSCRIPT THREE
	00B5 ( µ ) => 03BC ( μ ) # MICRO SIGN
	00B9 ( ¹ ) => 0031 ( 1 ) # SUPERSCRIPT ONE
	00BA ( º ) => 006F ( o ) # MASCULINE ORDINAL INDICATOR
	0130 ( İ ) => 0069 0307 ( i̇ ) # LATIN CAPITAL LETTER I WITH DOT ABOVE
	0132 ( Ĳ ) => 0069 006A ( ij ) # LATIN CAPITAL LIGATURE IJ
	013F ( Ŀ ) => 006C 00B7 ( l• ) # LATIN CAPITAL LETTER L WITH MIDDLE DOT
	0140 ( ŀ ) => 006C 00B7 ( l• ) # LATIN SMALL LETTER L WITH MIDDLE DOT
	0149 ( ŉ ) => 02BC 006E ( ʼn ) # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
	017F ( ſ ) => 0073 ( s ) # LATIN SMALL LETTER LONG S
	01C4 ( Ǆ ) => 0064 017E ( dž ) # LATIN CAPITAL LETTER DZ WITH CARON
	01F3 ( ǳ ) => 0064 007A ( dz ) # LATIN SMALL LETTER DZ

By using a mapping table based on the NFKC_CF property already exposed in Unicode (reflecting IDNA mapping as designed in IDNA 2003), modified to improve compatibility with IDNA 2008, it is possible to address the concern expressed above. The table is available in http://www.unicode.org/Public/idna/5.1.0/IdnaMappingTable.txt and its construction is explained in section 7 of the latest TR46 draft in http://www.unicode.org/reports/tr46/ 

The editing instruction for idna-mappings to include referencing to this new mapping table follows:
<<
In Section 2, replace items 1-4 and all following text by:

========================================================================

1. For each code point in the input string to be used under the IDNA
   protocol, map the code point using the [IDNA Mapping Table], as follows:

   a. Look up the status value for the code point in the table.

   b. If the status is "ignored", removed the code point from the input
      string.

   c. If the status is "mapped", replace the code point in the input string
      by the mapped value in the table. Note that the mapped value
      may consist of more than one code point.

   d. For any other status ("valid", "disallowed", "deviation"), and for
      any code point which is unassigned for the Unicode version of
      the table, leave the code point unchanged in the input string.

2. Normalize the string which results from the mapping in step 1, using
   Unicode Normalization Form C (NFC).

Note that the result of this mapping and normalization of the input string may result in a string which is not valid per [I-D.ietf-idnabis-protocol], because it may contain disallowed or unassigned code points, or may otherwise fail well-formedness conditions specified in that protocol. Such verification is outside the scope of this document.

If the mappings in this document are applied to versions of Unicode later than Unicode 5.1, the corresponding version of the IDNA Mapping Table for those later versions of the Unicode Standard should be used.
====================================
In Section 6 Normative references
Add:

[IDNA Mapping Table] add reference to
http://www.unicode.org/Public/idna/5.1.0/IdnaMappingTable.txt

=========================
Best regards,
Michel