How to know what codepoints are unassigned

Patrik Fältström patrik at
Sat May 3 05:22:30 CEST 2008

What confuses me is Table 2-3 on Page 27 of Unicode 5.0.0 ( 
) which uses for me a slightly different terminology than what you use  
here. According to that table, some Cn actually have assigned  

The table make the following definitions:

Cn+Cs = Not assigned to abstract character

Cn-Noncharacter = Undesignated (unassigned) code point

Note that the only place "unassigned" exists, is clearly NOT for Cn,  
but for a subset of Cn.

In the work I do, I will set UNASSIGNED derived property to Cn- 
Noncharacter. The Noncharacter stays DISALLOWED.


On 3 maj 2008, at 05.11, Mark Davis wrote:

> Let me try again. (And this is not your fault -- I've never liked  
> the way
> that this particular terminology is handled, since it leads to
> misunderstandings.)
> Given a code point X:
>   1. *gc=Cn* means that there is no *character* assigned for X. That  
> is,
>   X is not an *assigned character*. Note that the long name for  
> gc=Cn is
>   *General_Category=Unassigned* (see
>   2. *Noncharacter=true* means that X, although *not* assigned as a
>   character, is given a special function. In that sense, it is an  
> *assigned
>   code point*, just not assigned as a character.
> There are some other oddities: for example, surrogate code points  
> (gc=Cs)
> are also not assigned characters, but they are not gc=unassigned.  
> Those,
> however, don't seem to cause people much problem conceptually.  
> Functionally,
> as I've said, noncharacters are best thought of as "super private use"
> characters, and they could have been incorporated into the general  
> category
> under that rubric, but that sense evolved over time.
> This is all water under the bridge, mostly due to history, as the
> architecture grew in unforeseen ways, and stability policies put  
> into place
> a long time ago prevented changes that would have made it conceptually
> simpler.
> Does that help any?
> Mark
> On Fri, May 2, 2008 at 7:31 PM, Patrik Fältström <patrik at>  
> wrote:
>> On 30 apr 2008, at 22.38, Mark Davis wrote:
>> The code points that are unassigned (gc=Cn) but that should be
>>> DISALLOWED are all and only the Noncharacters.
>> What has confused me all the time here is that I interpreted what  
>> Mark say
>> as if gc=Cn give the unassigned codepoints.
>> That is not true, wich shows that I misunderstood what he here wrote.
>> gc=Cn gives the unassigned codepoints PLUS the Noncharacter ones.
>> So, one can NOT use gc=Cn as a test for unassigned codepoints. It  
>> is more
>> complicated than that.
>>   Patrik
> -- 
> Mark

More information about the Idna-update mailing list