How to know what codepoints are unassigned
patrik at frobbit.se
Sat May 3 05:22:30 CEST 2008
What confuses me is Table 2-3 on Page 27 of Unicode 5.0.0 (http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf
) which uses for me a slightly different terminology than what you use
here. According to that table, some Cn actually have assigned
The table make the following definitions:
Cn+Cs = Not assigned to abstract character
Cn-Noncharacter = Undesignated (unassigned) code point
Note that the only place "unassigned" exists, is clearly NOT for Cn,
but for a subset of Cn.
In the work I do, I will set UNASSIGNED derived property to Cn-
Noncharacter. The Noncharacter stays DISALLOWED.
On 3 maj 2008, at 05.11, Mark Davis wrote:
> Let me try again. (And this is not your fault -- I've never liked
> the way
> that this particular terminology is handled, since it leads to
> Given a code point X:
> 1. *gc=Cn* means that there is no *character* assigned for X. That
> X is not an *assigned character*. Note that the long name for
> gc=Cn is
> *General_Category=Unassigned* (see
> 2. *Noncharacter=true* means that X, although *not* assigned as a
> character, is given a special function. In that sense, it is an
> code point*, just not assigned as a character.
> There are some other oddities: for example, surrogate code points
> are also not assigned characters, but they are not gc=unassigned.
> however, don't seem to cause people much problem conceptually.
> as I've said, noncharacters are best thought of as "super private use"
> characters, and they could have been incorporated into the general
> under that rubric, but that sense evolved over time.
> This is all water under the bridge, mostly due to history, as the
> architecture grew in unforeseen ways, and stability policies put
> into place
> a long time ago prevented changes that would have made it conceptually
> Does that help any?
> On Fri, May 2, 2008 at 7:31 PM, Patrik Fältström <patrik at frobbit.se>
>> On 30 apr 2008, at 22.38, Mark Davis wrote:
>> The code points that are unassigned (gc=Cn) but that should be
>>> DISALLOWED are all and only the Noncharacters.
>> What has confused me all the time here is that I interpreted what
>> Mark say
>> as if gc=Cn give the unassigned codepoints.
>> That is not true, wich shows that I misunderstood what he here wrote.
>> gc=Cn gives the unassigned codepoints PLUS the Noncharacter ones.
>> So, one can NOT use gc=Cn as a test for unassigned codepoints. It
>> is more
>> complicated than that.
More information about the Idna-update