noncharacters and unassigned

Erik van der Poel erikv at google.com
Fri Feb 8 18:23:03 CET 2008


Patrik,

Regarding the latest tables-04 draft:

http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-04.txt

The code points 10FFFD..10FFFF are missing. I can see absolutely no
reason to omit 10FFFD. It is simply a private use code point, just
like 100000..10FFFC.

You may have omitted 10FFFE and 10FFFF because they are not listed in
UnicodeData.txt. They are not listed there because they are
"noncharacters". However, you include other noncharacters, namely,
FDD0..FDEF and *FFFE and *FFFF where * is 0..F. But noncharacters are
*not* reserved. They are a kind of super-private use characters that
are not supposed to be interchanged (unlike normal private use, which
may be interchanged).

Unicode has a number of definitions for terms that start with "unassigned":

http://www.unicode.org/glossary/#U

Since their definitions are so confusing, it might be better to use
the term "reserved", which is used a lot in IETF. However, changing
all of the IDNA200X documents to say "reserved" instead of
"unassigned" may be a lot of work, so we could leave it as
"unassigned", as long as we clarify that we are referring to Unicode's
unassigned/undesignated/reserved code point.

Also, the noncharacters should be DISALLOWED, not UNASSIGNED.

Erik


More information about the Idna-update mailing list