Reserved general punctuation
Kenneth Whistler
kenw at sybase.com
Thu May 1 20:41:09 CEST 2008
Patrik asked:
> > In Unicode, what we've been referring to as "unassigned" (more
> > precisely
> > gc=Cn) means that a code point (from 0 to 10FFFF) is not assigned
> > **to a
> > character**.
>
> In what file of the Unicode distribution can I find every codepoint
> that have gc=Cn?
http://www.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt
Right at the top of that file, in fact.
Also if you examine the listing carefully, you will see that
while most of the gc=Cn characters are "reserved", all
of the noncharacters are also among the list. For example:
FFEF..FFF8 ; Cn # [10] <reserved-FFEF>..<reserved-FFF8>
but
FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF>
The place to get the *concise* listing of all the noncharacters
is:
http://www.unicode.org/Public/UNIDATA/PropList.txt
and search down for "Noncharacter_Code_Point".
>
> Is that the same as the codepoints that are missing from
> UnicodeData.txt? (I know about the "first", "last" issues...)
Correct. No gc=Cn code points are listed in UnicodeData.txt.
--Ken
More information about the Idna-update
mailing list