<HTML>
<HEAD>
<TITLE>Re: Editorial questions</TITLE>
</HEAD>
<BODY>
<FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>
Den 2009-11-20 22.16, skrev "Harald Alvestrand" <<a href="harald@alvestrand.no">harald@alvestrand.no</a>>:<BR>
<BR>
<FONT COLOR="#0000FF">> So is there a distinction between 10FFFD, 10FFFE and 10FFFF, and does this <BR>
> document need to make that distinction?<BR>
> (I think it's unreasonable to allow any of them in domain names,<BR>
> so they should all be DISALLOWED, but i'm not surprised that there's<BR>
> inconsistencies here.)<BR>
</FONT><BR>
I wouldn't use the term "inconsistencies"... For administrative reasons<BR>
(file data compatibility) these aren't given in UnicodeData.txt. See PropList.txt, which says:<BR>
<BR>
FDD0..FDEF ; Noncharacter_Code_Point # Cn [32] <noncharacter-FDD0>..<noncharacter-FDEF><BR>
FFFE..FFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-FFFE>..<noncharacter-FFFF><BR>
1FFFE..1FFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-1FFFE>..<noncharacter-1FFFF><BR>
2FFFE..2FFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-2FFFE>..<noncharacter-2FFFF><BR>
3FFFE..3FFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-3FFFE>..<noncharacter-3FFFF><BR>
4FFFE..4FFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-4FFFE>..<noncharacter-4FFFF><BR>
5FFFE..5FFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-5FFFE>..<noncharacter-5FFFF><BR>
6FFFE..6FFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-6FFFE>..<noncharacter-6FFFF><BR>
7FFFE..7FFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-7FFFE>..<noncharacter-7FFFF><BR>
8FFFE..8FFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-8FFFE>..<noncharacter-8FFFF><BR>
9FFFE..9FFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-9FFFE>..<noncharacter-9FFFF><BR>
AFFFE..AFFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-AFFFE>..<noncharacter-AFFFF><BR>
BFFFE..BFFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-BFFFE>..<noncharacter-BFFFF><BR>
CFFFE..CFFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-CFFFE>..<noncharacter-CFFFF><BR>
DFFFE..DFFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-DFFFE>..<noncharacter-DFFFF><BR>
EFFFE..EFFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-EFFFE>..<noncharacter-EFFFF><BR>
FFFFE..FFFFF ; Noncharacter_Code_Point # Cn [2] <noncharacter-FFFFE>..<noncharacter-FFFFF><BR>
10FFFE..10FFFF; Noncharacter_Code_Point # Cn [2] <noncharacter-10FFFE>..<noncharacter-10FFFF><BR>
<BR>
All of the *FE and *FF ones have been "permanently reserved" (a.k.a. non-character code points)<BR>
since the initial synchronisation with ISO/IEC 10646 (1993). I cannot recall the exact reason, but<BR>
for FFFE it had (and still has) to do with byte-order mark and its representation in UCS-2/UTF-16.<BR>
The FDD0-FDEF ones were reserved later, since one wanted more non-character code points, for<BR>
internal processing reasons.<BR>
<BR>
<a href="http://tools.ietf.org/id/draft-ietf-idnabis-tables-07.txt">http://tools.ietf.org/id/draft-ietf-idnabis-tables-07.txt</a> covers all of the above non-characters<BR>
as DISALLOWED except for U+10FFFF which somehow has been missed out. Note though<BR>
that the title of Appendix B does not miss out U+10FFFF...<BR>
<BR>
/kent k<BR>
</SPAN></FONT>
</BODY>
</HTML>