How to know what codepoints are unassigned
Paul Hoffman
phoffman at imc.org
Sun May 4 04:04:28 CEST 2008
At 2:35 AM +0200 5/4/08, Frank Ellermann wrote:
>John C Klensin wrote:
>
>> Non-character code points that have specific
>> non-characters assigned to them are DISALLOWED
>> (unless they are exceptions), but by other rules.
>
>I'm not sure what you are up to, but if it's about
>the 2048 surrogates and the 66 non-characters you
>can simply hardwire them, AFAIK they are not going
>to change, no additions, no substractions, forever.
>
>RFC 3987 covers these concepts in <ucschar>. The
>non-characters consists of two ??FFFE + ??FFFF per
>plane and 32 u+FDD0 .. u+FDEF for 66 = 2 * 17 + 32.
>(The 32 are nice to swap out C0 or C1 temporarily).
While it seems very likely that all these will be non-characters
forever, other non-characters could be added in the future. If we use
the process of identifying unassigned codepoints first, then
additionally prohibiting noncharcters, we don't need to use the logic
you have listed here, do we?
More information about the Idna-update
mailing list