How to know what codepoints are unassigned

John C Klensin klensin at jck.com
Sat May 3 17:50:36 CEST 2008



--On Saturday, 03 May, 2008 09:25 -0400 Vint Cerf
<vint at google.com> wrote:

> theorem 207: everything is more complicated.
> 
> :-)

Vint,

While this excursion has been very interesting (at least to me),
I think it actually is pretty simple (and hope that Mark or Ken
will quickly correct me if I'm wrong).

Patrik needs to do only two things:

	(1) Move the test for "unassigned" very early in his
	rule-application sequence, perhaps first, so that code
	points that are not bound to actual characters are not
	accidentally picked up by other tests.
	
(2) Make the test for "unassigned" a simple test for the absence
of an entry in unicodedata.txt, a table that he needs to look at
anyway for other properties.  What we have just learned is that
the "unassigned" test is not a test for the presence or absence
of some other property.  

This means that

	* Non-character and reserved code points that have
	nothing specifically assigned to them are UNASSIGNED.
	
	* Non-character code points that have specific
	non-characters assigned to them are DISALLOWED (unless
	they are exceptions), but by other rules.

I think that is intuitively correct and matches a common-sense
interpretation of what "unassigned" means.

   john



More information about the Idna-update mailing list