Reserved general punctuation

Patrik Fältström patrik at frobbit.se
Sat May 3 04:40:50 CEST 2008


On 1 maj 2008, at 20.41, Kenneth Whistler wrote:

> Also if you examine the listing carefully, you will see that
> while most of the gc=Cn characters are "reserved", all
> of the noncharacters are also among the list.

Yes, I saw that, and unfortunately that make this list in  
DerivedGeneralCategory.txt from my point of view not correct. If the  
table list "unassigned" (the header say  
"General_Category=Unassigned"), then those codepoints (for example U+  
FFFE) should not be there.

You pointed at Table 2-3 "Types of Code Points" on p. 27 of the  
Unicode 5.0 text (http://www.unicode.org/versions/Unicode5.0.0/ 
ch02.pdf) that clearly show that U+FFFE is not an unassigned  
codepoint, but is gc=Cn.

So, the table in DerivedGeneralCategory.txt show gc=Cn which according  
to the types of codepoints table is a larger set of codepoints than  
unassigned.

Now I think I have this under control :-)

Thanks to you Ken! Thanks!

Expect a revised version of the tables document.

    Patrik




More information about the Idna-update mailing list