Normalization stability non-issue (was: Criteria for exceptional characters)

Martin Duerst duerst at it.aoyama.ac.jp
Sun Dec 17 10:09:55 CET 2006


At 12:26 06/12/17, Mark Davis wrote:

>On 12/16/06, Michael Everson <<mailto:everson at evertype.com> everson at evertype.com> wrote:
>>At 15:58 -0800 2006-12-16, Mark Davis wrote:

>>>3. There are concerns about the stability of normalization
>>
>>Are they valid? What are they, specifically?
>
>See <http://www.unicode.org/reports/tr15/#Versioning>http://www.unicode.org/reports/tr15/#Versioning 

>>>3. Looks like we have a solution (restrict the 
>>>sequences that could change between 3.2 and 5.0;
>>>the Unicode consortium is tightening stability
>>>to disallow further changes)
>>
>>Please create a separate thread to discuss this particular issue.
>
>If and when it requires further discussion.

Done herewith.

Looking at http://www.unicode.org/reports/tr15/#Versioning,
my conclusion is that this is currently essentially a
non-issue:

- Corrigendum #2 (Yod with Hiriq) is Unicode 3.1, so it's
  subsumed in the current IDNA, with is based on Unicode 3.2.
- Corrigendum #3 is for a Korean compatibility ideograph. According
  to the current rules, it will be prohibited anyway. The only
  thing that leaving the error as it was in Unicode 3.2 would
  be user confusion; because of the rarity of this character
  actually being used in Korean, the actual chance for such
  user confusion is low.
- Corrigendum #4 is for five plane 2 compatibility ideographs.
  Again, this does not affect normalized text, and the current
  proposal excludes these characters anyway.
- Corrigendum #5, "Normalization Idempotency", affects only character
  sequences that don't appear in practice. Pre-corrigendum
  implementations of IDNA actually may reject these sequences
  because of the various encoding/decoding steps, but this
  may vary by implementation, and is not relevant in practice.

So my suggestion would be that the IDNAbis say something like:
IDNAbis implementations SHOULD use implementations of NFC/NFKC
that integrate Corrigendum 2-5 of UTR #15, but MAY use older
implementations of NFC/NFKC.

While I know there are several people who might jump up in
the air and cry "we can't possibly allow for any iota of
indeterminedness in IDNA", a provision like the above is
exactly the right thing in the situation we are facing
(it doesn't matter, so don't unnecessarily restrict
implementations). Rather than being overly nervous,
we have to make sure we know where to be nervous and
where not.

Also, in the long run, we have to acknowledge that with
over 100,000 characters, the chances that there is a
(clerical) mistake for one of them are non-negligible
even with the best procedures and lots of eyes. So
further tightening of stability policies may be helpful
politically, but could hurt otherwise.

Regards,   Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list