sharp s (Eszett)

Patrik Fältström patrik at frobbit.se
Sat Mar 8 09:51:25 CET 2008


On 7 mar 2008, at 23.01, Mark Davis wrote:

> The main reason for mapping ß to "ss" in IDNA2003 is for case  
> insensitivity.


(Comments are based on version -05 of the tables draft.)

Correct, but let me add some more to this.

...and the main reason for assigning DISALLOWED derived property value  
to it in IDNA200x is because it matches category A and B, and because  
it matches category B, it is DISALLOWED. Only way of allowing it is to  
assign a property value because we should treat it as an exception.

> 2.1.2.  Category B - Normalization and Casefolding
>
>    B: toNFKC(toCaseFolded(toNFKC(cp))) != cp
>
>    The category is used to group the characters that are not stable
>    under NFKC normalization and casefolding.  In general, these
>    codepoints are not suitable for use for IDN.

I do not like exceptions as that forces the IETF to make decisions on  
codepoints on a codepoint by codepoint basis. I rather see IETF make  
decisions based on what properties are ok or not.

That said, we already have some exceptions, but, they are rare and I  
want us all to understand what is means to add an exception to  
category F. It implies the IETF create an exception rule that  
"overrides" what decisions are made by the Unicode Consortium.  
Something we can do (and we have some suggestions, see below), but  
they are still "exceptions".

> 2.2.2.  Category F - Exceptions
>
>    F: cp in {002D, 00B7, 02B9, 0375, 0483, 05F3, 05F4, 3005,
>              3007, 303B, 30FB}
>
>    This category explicitly lists codepoints for which the category
>    cannot be assigned using only the core property values that exist  
> in
>    the Unicode standard.  The values are according to the table below:
>
>    002D; CONTEXTO  # HYPHEN-MINUS
>    00B7; CONTEXTO  # MIDDLE DOT
>    02B9; CONTEXTO  # MODIFIER LETTER PRIME
>    0375; CONTEXTO  # GREEK LOWER NUMERAL SIGN (KERAIA)
>    0483; CONTEXTO  # COMBINING CYRILLIC TILTO
>    05F3; CONTEXTO  # HEBREW PUNCTUATION GERESH
>    05F4; CONTEXTO  # HEBREW PUNCTUATION GERSHAYIM
>    3005; CONTEXTO  # IDEOGRAPHIC ITERATION MARK
>    3007; PVALID    # IDEOGRAPHIC NUMBER ZERO
>    303B; CONTEXTO  # VERTICAL IDEOGRAPHIC ITERATION MARK
>    30FB; CONTEXTO  # KATAKANA MIDDLE DOT
>
>    The characters 02B9, 0375 and 0483 are used in different scripts to
>    indicate that an adjacent letter is being used with a numeric  
> value.

     Patrik



More information about the Idna-update mailing list