How to know what codepoints are unassigned
Frank Ellermann
hmdmhdfmhdjmzdtjmzdtzktdkztdjz at gmail.com
Sun May 4 05:01:43 CEST 2008
Paul Hoffman wrote:
>> AFAIK they are not going to change, no additions,
>> no substractions, forever.
[...]
> While it seems very likely that all these will be
> non-characters forever, other non-characters could
> be added in the future.
Here's how table F.1 in TUS 5.0 puts it:
Applicable versions
| Unicode 3.1+
Constraints
| The Noncharacter_Code_Point property is an immutable
| code point property, which means that its property
| values for all Unicode code points will never change.
Once a non-character, forever a non-character. Once
not a non-character forever not a non-character. For
a binary property that covers all code points, or is
there a trick to add more non-characters ?
The magic word "immutable" is also associated with the
Pattern_Syntax and Pattern_Whitespace properties since
version 4.1 in appendix F (encoding stability policies
for TUS).
Obviously you couldn't use Unicode 4.1 for IDNA-2003.
> If we use the process of identifying unassigned
> codepoints first, then additionally prohibiting
> noncharcters, we don't need to use the logic you
> have listed here, do we?
Dunno, if you somehow get 2048 + 66 + private use + tag
characters as DISALLOWED it is a start. It could be a
nice plausibility check to design the algorithm in a
way that outputs UNASSIGNED *last* (not first) with the
check "if any UNASSIGNED isn't unassigned throw a fatal
error".
Frank
More information about the Idna-update
mailing list