Editorial questions

Kent Karlsson kent.karlsson14 at comhem.se
Sun Nov 22 11:36:19 CET 2009


Den 2009-11-20 22.16, skrev "Harald Alvestrand" <harald at alvestrand.no>:

> So is there a distinction between 10FFFD, 10FFFE and 10FFFF, and does this
> document need to make that distinction?
> (I think it's unreasonable to allow any of them in domain names,
> so they should all be DISALLOWED, but i'm not surprised that there's
> inconsistencies here.)

I wouldn't use the term "inconsistencies"... For administrative reasons
(file data compatibility) these aren't given in UnicodeData.txt. See
PropList.txt, which says:

FDD0..FDEF    ; Noncharacter_Code_Point # Cn  [32]
<noncharacter-FDD0>..<noncharacter-FDEF>
FFFE..FFFF    ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-FFFE>..<noncharacter-FFFF>
1FFFE..1FFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-1FFFE>..<noncharacter-1FFFF>
2FFFE..2FFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-2FFFE>..<noncharacter-2FFFF>
3FFFE..3FFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-3FFFE>..<noncharacter-3FFFF>
4FFFE..4FFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-4FFFE>..<noncharacter-4FFFF>
5FFFE..5FFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-5FFFE>..<noncharacter-5FFFF>
6FFFE..6FFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-6FFFE>..<noncharacter-6FFFF>
7FFFE..7FFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-7FFFE>..<noncharacter-7FFFF>
8FFFE..8FFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-8FFFE>..<noncharacter-8FFFF>
9FFFE..9FFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-9FFFE>..<noncharacter-9FFFF>
AFFFE..AFFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-AFFFE>..<noncharacter-AFFFF>
BFFFE..BFFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-BFFFE>..<noncharacter-BFFFF>
CFFFE..CFFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-CFFFE>..<noncharacter-CFFFF>
DFFFE..DFFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-DFFFE>..<noncharacter-DFFFF>
EFFFE..EFFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-EFFFE>..<noncharacter-EFFFF>
FFFFE..FFFFF  ; Noncharacter_Code_Point # Cn   [2]
<noncharacter-FFFFE>..<noncharacter-FFFFF>
10FFFE..10FFFF; Noncharacter_Code_Point # Cn   [2]
<noncharacter-10FFFE>..<noncharacter-10FFFF>

All of the *FE and *FF ones have been "permanently reserved" (a.k.a.
non-character code points)
since the initial synchronisation with ISO/IEC 10646 (1993). I cannot recall
the exact reason, but
for FFFE it had (and still has) to do with byte-order mark and its
representation in UCS-2/UTF-16.
The FDD0-FDEF ones were reserved later, since one wanted more non-character
code points, for
internal processing reasons.

http://tools.ietf.org/id/draft-ietf-idnabis-tables-07.txt covers all of the
above non-characters
as DISALLOWED except for U+10FFFF which somehow has been missed out. Note
though
that the title of Appendix B does not miss out U+10FFFF...

    /kent k

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20091122/6ec39afe/attachment.htm 


More information about the Idna-update mailing list