Change of the algorithm

Paul Hoffman phoffman at imc.org
Sat Mar 15 23:12:59 CET 2008


At 6:04 PM -0400 3/15/08, Patrik Fältström wrote:
>The rationale is that a codepoint that is in any of the the 
>categories B, C and D should be DISALLOWED -- if there is no 
>exception or if it is US-ASCII. Regardless of whether it is part of 
>any of the other categories.

OK, but...

>Maybe I am thinking wrong here, but there are things that are in C 
>and I (i.e. both). My take is that those codepoints should be 
>DISALLOWED. Not CONTEXTO.

The definition of C is:
    C: property(cp) is in {Default_Ignorable_Code_Point, White_Space,
                           Noncharacter_Code_Point}
. . .
    The definition for Default_Ignorable_Code_Point can be found in
    DerivedCoreProperties.txt [1] (and erratum of 2007-January-25 [2])
    and is
    Other_Default_Ignorable_Code_Point + Cf + Cc + Cs
    + Noncharacter_Code_Point + Variation_Selector
    - White_Space - FFF9..FFFB (Annotation Characters)
That means that C contains all of {Cf}, other than white space and 
annotation characters

The definition of I is:
    I: generalCategory(cp) is in {Cf}

So, putting the check for a character in C before the check for a 
character in I means that the check for the character in I will never 
happen if the character is in {Cf}. So, there is no need to define I, 
and nothing will ever be CONTEXTO. I don't think that is what you 
wanted.


More information about the Idna-update mailing list