How to know what codepoints are unassigned
John C Klensin
klensin at jck.com
Sat May 3 17:50:36 CEST 2008
--On Saturday, 03 May, 2008 09:25 -0400 Vint Cerf
<vint at google.com> wrote:
> theorem 207: everything is more complicated.
>
> :-)
Vint,
While this excursion has been very interesting (at least to me),
I think it actually is pretty simple (and hope that Mark or Ken
will quickly correct me if I'm wrong).
Patrik needs to do only two things:
(1) Move the test for "unassigned" very early in his
rule-application sequence, perhaps first, so that code
points that are not bound to actual characters are not
accidentally picked up by other tests.
(2) Make the test for "unassigned" a simple test for the absence
of an entry in unicodedata.txt, a table that he needs to look at
anyway for other properties. What we have just learned is that
the "unassigned" test is not a test for the presence or absence
of some other property.
This means that
* Non-character and reserved code points that have
nothing specifically assigned to them are UNASSIGNED.
* Non-character code points that have specific
non-characters assigned to them are DISALLOWED (unless
they are exceptions), but by other rules.
I think that is intuitively correct and matches a common-sense
interpretation of what "unassigned" means.
john
More information about the Idna-update
mailing list