How to know what codepoints are unassigned

Paul Hoffman phoffman at imc.org
Sun May 4 04:04:28 CEST 2008


At 2:35 AM +0200 5/4/08, Frank Ellermann wrote:
>John C Klensin wrote:
>
>>  Non-character code points that have specific
>>  non-characters assigned to them are DISALLOWED
>>  (unless they are exceptions), but by other rules.
>
>I'm not sure what you are up to, but if it's about
>the 2048 surrogates and the 66 non-characters you
>can simply hardwire them, AFAIK they are not going
>to change, no additions, no substractions, forever.
>
>RFC 3987 covers these concepts in <ucschar>.  The
>non-characters consists of two ??FFFE + ??FFFF per
>plane and 32 u+FDD0 .. u+FDEF for 66 = 2 * 17 + 32.
>(The 32 are nice to swap out C0 or C1 temporarily).

While it seems very likely that all these will be non-characters 
forever, other non-characters could be added in the future. If we use 
the process of identifying unassigned codepoints first, then 
additionally prohibiting noncharcters, we don't need to use the logic 
you have listed here, do we?


More information about the Idna-update mailing list