idna-mapping update

Vint Cerf vint at google.com
Sat Dec 19 13:50:20 CET 2009


Another way to think about this, Michel, is that the IDNABIS working  
group simply
does not make a normative recommendation on mapping. It has been  
consistent
about no mapping for registration (in other words, you register only  
PVALID
characters and the registry does not map for the registrant).

With regard to lookup, there isn't consensus within the IDNABIS WG on  
either
the nature of mappings or even the advisability.

It has been suggested that a better forum in which to deal with  
IDNA2003 and
IDNA2008 incompatibility is the ICANN IDN Guidelines Committee. That may
be a better forum with broader participation than the IDNABIS working  
group
in which the TR46 proposal or other proposals may be discussed. If we  
adopt
Cary Karp's offer, your observations, below, would be input into the  
Guidelines
committee discussions.

vint


On Dec 18, 2009, at 10:13 PM, Michel SUIGNARD wrote:

>> From Lisa Dusseault (Dec 1st)
>> I don't believe we know what the WG consensus position is around how
>> strongly pre-lookup mappings are recommended and in what use cases,
>> and how compatible optional pre-lookup mappings are with IDNA2003
>> in-protocol mapping.
>
> I'd like to give a new feedback to that statement. The issue some of  
> us have with the current recommendation in idna-mappings [draft-ietf- 
> idnabis-mappings-05] is that it is vastly different from the mapping  
> done in IDNA_2003, especially concerning compatibility mapping done  
> beyond the narrow/wide mapping suggested in the current document.  
> The solution proposes the referencing of a single mapping table,  
> improving greatly odds that implementers will do the right thing.  
> Finally, it makes trivial for the draft Unicode TR46 to refer to a  
> common mapping definition, avoiding potential confusion and  
> unnecessary duplication.
>
> Some examples of characters mapped differently between idna-mappings  
> and idna 2003 (in idna-mappings they stay unmapped):
>
> 	00AA ( ª ) => 0061 ( a ) # FEMININE ORDINAL INDICATOR
> 	00B2 ( ² ) => 0032 ( 2 ) # SUPERSCRIPT TWO
> 	00B3 ( ³ ) => 0033 ( 3 ) # SUPERSCRIPT THREE
> 	00B5 ( µ ) => 03BC ( μ ) # MICRO SIGN
> 	00B9 ( ¹ ) => 0031 ( 1 ) # SUPERSCRIPT ONE
> 	00BA ( º ) => 006F ( o ) # MASCULINE ORDINAL INDICATOR
> 	0130 ( İ ) => 0069 0307 ( i̇ ) # LATIN CAPITAL LETTER I WITH DOT  
> ABOVE
> 	0132 ( IJ ) => 0069 006A ( ij ) # LATIN CAPITAL LIGATURE IJ
> 	013F ( Ŀ ) => 006C 00B7 ( l• ) # LATIN CAPITAL LETTER L WITH  
> MIDDLE DOT
> 	0140 ( ŀ ) => 006C 00B7 ( l• ) # LATIN SMALL LETTER L WITH MIDDLE  
> DOT
> 	0149 ( ʼn ) => 02BC 006E ( ʼn ) # LATIN SMALL LETTER N PRECEDED BY  
> APOSTROPHE
> 	017F ( ſ ) => 0073 ( s ) # LATIN SMALL LETTER LONG S
> 	01C4 ( DŽ ) => 0064 017E ( dž ) # LATIN CAPITAL LETTER DZ WITH CARON
> 	01F3 ( dz ) => 0064 007A ( dz ) # LATIN SMALL LETTER DZ
>
> By using a mapping table based on the NFKC_CF property already  
> exposed in Unicode (reflecting IDNA mapping as designed in IDNA  
> 2003), modified to improve compatibility with IDNA 2008, it is  
> possible to address the concern expressed above. The table is  
> available in http://www.unicode.org/Public/idna/5.1.0/IdnaMappingTable.txt 
>  and its construction is explained in section 7 of the latest TR46  
> draft in http://www.unicode.org/reports/tr46/
>
> The editing instruction for idna-mappings to include referencing to  
> this new mapping table follows:
> <<
> In Section 2, replace items 1-4 and all following text by:
>
> = 
> = 
> ======================================================================
>
> 1. For each code point in the input string to be used under the IDNA
>   protocol, map the code point using the [IDNA Mapping Table], as  
> follows:
>
>   a. Look up the status value for the code point in the table.
>
>   b. If the status is "ignored", removed the code point from the input
>      string.
>
>   c. If the status is "mapped", replace the code point in the input  
> string
>      by the mapped value in the table. Note that the mapped value
>      may consist of more than one code point.
>
>   d. For any other status ("valid", "disallowed", "deviation"), and  
> for
>      any code point which is unassigned for the Unicode version of
>      the table, leave the code point unchanged in the input string.
>
> 2. Normalize the string which results from the mapping in step 1,  
> using
>   Unicode Normalization Form C (NFC).
>
> Note that the result of this mapping and normalization of the input  
> string may result in a string which is not valid per [I-D.ietf- 
> idnabis-protocol], because it may contain disallowed or unassigned  
> code points, or may otherwise fail well-formedness conditions  
> specified in that protocol. Such verification is outside the scope  
> of this document.
>
> If the mappings in this document are applied to versions of Unicode  
> later than Unicode 5.1, the corresponding version of the IDNA  
> Mapping Table for those later versions of the Unicode Standard  
> should be used.
> ====================================
> In Section 6 Normative references
> Add:
>
> [IDNA Mapping Table] add reference to
> http://www.unicode.org/Public/idna/5.1.0/IdnaMappingTable.txt
>
> =========================
> Best regards,
> Michel
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update



More information about the Idna-update mailing list