Greek Casefolding sigma
Martin Duerst
duerst at it.aoyama.ac.jp
Tue Apr 1 08:24:56 CEST 2008
At 03:26 08/04/01, Markus Scherer wrote:
>On Mon, Mar 31, 2008 at 6:03 PM, Mark Davis <mark.davis at icu-project.org> wrote:
>> I don't think the original Punycode mechanism would work, since I think it
>> would be an incompatible change in the result compared to strings encoded
>> under IDNA2003 (especially since, it only allows for 1 bit per character, as
>> you say).
>
>Does IDNA2003 forbid using mixed-case Punycode strings? I thought it
>allowed them.
From http://www.ietf.org/rfc/rfc3490.txt, 2. Terminology:
In IDNA, equivalence of labels is defined in terms of the ToASCII
operation, which constructs an ASCII form for a given label, whether
or not the label was already an ASCII label. Labels are defined to
be equivalent if and only if their ASCII forms produced by ToASCII
match using a case-insensitive ASCII comparison. ASCII labels
already have a notion of equivalence: upper case and lower case are
considered equivalent. The IDNA notion of equivalence is an
extension of that older notion. Equivalent labels in IDNA are
treated as alternate forms of the same label, just as "foo" and "Foo"
are treated as alternate forms of the same label.
Also, 3.1 Requirements
4) Whenever two labels are compared, they MUST be considered to match
if and only if they are equivalent, that is, their ASCII forms
(obtained by applying ToASCII) match using a case-insensitive
ASCII comparison. Whenever two names are compared, they MUST be
considered to match if and only if their corresponding labels
match, regardless of whether the names use the same forms of label
separators.
Also, 4.2 ToUnicode
7. Verify that the result of step 6 matches the saved copy from step
3, using a case-insensitive ASCII comparison.
Also, from 5. ACE prefix:
The ToASCII and ToUnicode operations MUST
recognize the ACE prefix in a case-insensitive manner.
The ACE prefix for IDNA is "xn--" or any capitalization thereof.
And then we have http://www.ietf.org/rfc/rfc3492.txt,
A. Mixed-case annotation
In order to use Punycode to represent case-insensitive strings,
higher layers need to case-fold the strings prior to Punycode
encoding. The encoded string can use mixed case as an annotation
telling how to convert the folded string into a mixed-case string for
display purposes. Note, however, that mixed-case annotation is not
used by the ToASCII and ToUnicode operations specified in [IDNA], and
therefore implementors of IDNA can disregard this appendix.
Basic code points can use mixed case directly, because the decoder
copies them verbatim, leaving lowercase code points lowercase, and
leaving uppercase code points uppercase. Each non-basic code point
is represented by a delta, which is represented by a sequence of
basic code points, the last of which provides the annotation. If it
is uppercase, it is a suggestion to map the non-basic code point to
uppercase (if possible); if it is lowercase, it is a suggestion to
map the non-basic code point to lowercase (if possible).
These annotations do not alter the code points returned by decoders;
the annotations are returned separately, for the caller to use or
ignore. Encoders can accept annotations in addition to code points,
but the annotations do not alter the output, except to influence the
uppercase/lowercase form of ASCII letters.
Punycode encoders and decoders need not support these annotations,
and higher layers need not use them.
Regards, Martin.
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Idna-update
mailing list