How to know what codepoints are unassigned

Mark Davis mark.davis at
Sat May 3 05:11:37 CEST 2008

Let me try again. (And this is not your fault -- I've never liked the way
that this particular terminology is handled, since it leads to

Given a code point X:

   1. *gc=Cn* means that there is no *character* assigned for X. That is,
   X is not an *assigned character*. Note that the long name for gc=Cn is
   *General_Category=Unassigned* (see
   2. *Noncharacter=true* means that X, although *not* assigned as a
   character, is given a special function. In that sense, it is an *assigned
   code point*, just not assigned as a character.

There are some other oddities: for example, surrogate code points (gc=Cs)
are also not assigned characters, but they are not gc=unassigned. Those,
however, don't seem to cause people much problem conceptually. Functionally,
as I've said, noncharacters are best thought of as "super private use"
characters, and they could have been incorporated into the general category
under that rubric, but that sense evolved over time.

This is all water under the bridge, mostly due to history, as the
architecture grew in unforeseen ways, and stability policies put into place
a long time ago prevented changes that would have made it conceptually

Does that help any?


On Fri, May 2, 2008 at 7:31 PM, Patrik Fältström <patrik at> wrote:

> On 30 apr 2008, at 22.38, Mark Davis wrote:
>  The code points that are unassigned (gc=Cn) but that should be
> > DISALLOWED are all and only the Noncharacters.
> >
> What has confused me all the time here is that I interpreted what Mark say
> as if gc=Cn give the unassigned codepoints.
> That is not true, wich shows that I misunderstood what he here wrote.
> gc=Cn gives the unassigned codepoints PLUS the Noncharacter ones.
> So, one can NOT use gc=Cn as a test for unassigned codepoints. It is more
> complicated than that.
>    Patrik

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Idna-update mailing list