I-D Action:draft-ietf-idnabis-mappings-00.txt

Paul Hoffman phoffman at imc.org
Sat May 30 20:19:09 CEST 2009


I earlier criticized the -00 draft for not being technically correct. Here are my proposed changes for that. Basically, replace "NFC and some NFKC-like actions but avoid touching the characters that are allowed in IDNA2008 but weren't allowed in IDNA2003" with "protect the characters that are allowed in IDNA2008 but weren't allowed in IDNA2003, then do NFKC".

I propose to change the list in section 3 to:

   1.  U+00DF, U+03C2, and U+200C are mapped to characters that
       (a) are not used in the input, (b) are unaffected by any NFKC
       mapping, and (c) do not affect other characters in an NFKC mapping.

   2.  Capital (upper case) characters are mapped to their small (lower
       case) equivalents. [[anchor2: Need reference to "toLowerCase"]]

   3.  All characters are mapped using Unicode Normalization Form KC
       (NFKC).  [Unicode51]

   4.  Map the three characters from step 1 back to their actual forms.

(If I missed some characters in that first step, they should be added.)

The steps in Appendix A could then be changed to:

   1.  Validate that U+2585, U+2586, and U+2587 do not appear in the
       input; if they do, fail immediately.

   2.  Map U+00DF to U+2585, U+03C2 to U+2586, and U+200C to U+2587.

   3.  Map using table B.1 and B.2 from [RFC3454].

   4.  Normalize using Unicode Normalization Form KC.  [Unicode51]

   5.  Prohibit using tables C.1.2, C.3, C.4, C.5, C.6, C.7, C.8, and
       C.9 from [RFC3454].

   6.  Map U+2585 to U+00DF, U+2586 to U+03C2, and U+2587 to U+200C.

I think the above should be easy for an implementer to understand, and is technically correct. FWIW, the three characters I chose for mapping in Appendix A are nearly-indistinguishable rectangles that no one in their right mind would use in a domain name.


More information about the Idna-update mailing list