Greek Casefolding sigma

Tue Apr 1 08:24:56 CEST 2008

At 03:26 08/04/01, Markus Scherer wrote:
>On Mon, Mar 31, 2008 at 6:03 PM, Mark Davis <mark.davis at icu-project.org> wrote:
>> I don't think the original Punycode mechanism would work, since I think it
>> would be an incompatible change in the result compared to strings encoded
>> under IDNA2003 (especially since, it only allows for 1 bit per character, as
>> you say).
>
>Does IDNA2003 forbid using mixed-case Punycode strings? I thought it
>allowed them.

 From http://www.ietf.org/rfc/rfc3490.txt, 2. Terminology:

   In IDNA, equivalence of labels is defined in terms of the ToASCII
   operation, which constructs an ASCII form for a given label, whether
   or not the label was already an ASCII label.  Labels are defined to
   be equivalent if and only if their ASCII forms produced by ToASCII
   match using a case-insensitive ASCII comparison.  ASCII labels
   already have a notion of equivalence: upper case and lower case are
   considered equivalent.  The IDNA notion of equivalence is an
   extension of that older notion.  Equivalent labels in IDNA are
   treated as alternate forms of the same label, just as "foo" and "Foo"
   are treated as alternate forms of the same label.

Also, 3.1 Requirements

   4) Whenever two labels are compared, they MUST be considered to match
      if and only if they are equivalent, that is, their ASCII forms
      (obtained by applying ToASCII) match using a case-insensitive
      ASCII comparison.  Whenever two names are compared, they MUST be
      considered to match if and only if their corresponding labels
      match, regardless of whether the names use the same forms of label
      separators.

Also, 4.2 ToUnicode

   7. Verify that the result of step 6 matches the saved copy from step
      3, using a case-insensitive ASCII comparison.

Also, from 5. ACE prefix:

                               The ToASCII and ToUnicode operations MUST
   recognize the ACE prefix in a case-insensitive manner.

   The ACE prefix for IDNA is "xn--" or any capitalization thereof.

And then we have http://www.ietf.org/rfc/rfc3492.txt,
A. Mixed-case annotation

   In order to use Punycode to represent case-insensitive strings,
   higher layers need to case-fold the strings prior to Punycode
   encoding.  The encoded string can use mixed case as an annotation
   telling how to convert the folded string into a mixed-case string for
   display purposes.  Note, however, that mixed-case annotation is not
   used by the ToASCII and ToUnicode operations specified in [IDNA], and
   therefore implementors of IDNA can disregard this appendix.

   Basic code points can use mixed case directly, because the decoder
   copies them verbatim, leaving lowercase code points lowercase, and
   leaving uppercase code points uppercase.  Each non-basic code point
   is represented by a delta, which is represented by a sequence of
   basic code points, the last of which provides the annotation.  If it
   is uppercase, it is a suggestion to map the non-basic code point to
   uppercase (if possible); if it is lowercase, it is a suggestion to
   map the non-basic code point to lowercase (if possible).

   These annotations do not alter the code points returned by decoders;
   the annotations are returned separately, for the caller to use or
   ignore.  Encoders can accept annotations in addition to code points,
   but the annotations do not alter the output, except to influence the
   uppercase/lowercase form of ASCII letters.

   Punycode encoders and decoders need not support these annotations,
   and higher layers need not use them.

Regards,   Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp