Kenneth Whistler kenw at
Sat Feb 3 03:37:36 CET 2007

Since the issue came up today, I have gone ahead and drafted
what a property file for an IDN_Never property would look
like, with a representative (and conservative) first cut
at its content.


That has the same format as:

which I explained earlier.

For conservative criteria for what to absolutely, positively
guarantee are in the never, never, ever category, I have
started with:

1. cp != NFKC(cp)
2. cp has Pattern_Syntax property
3. cp has Pattern_White_Space property
4. cp has White_Space property
5. cp has Variation_Selector property
6. cp has Noncharacter_Code_Point property
7. cp has General_Category=Cf (Unicode format controls)
8. cp has General_Category=Cc (ISO controls)

(There is considerable overlap for some of those properties,
so not all of them may be required -- some may be redundant
for the purposes of this derivation. I just haven't done
the detailed analysis on this first cut yet.)

Then the following three exceptions are pulled from the

1. cp = U+002D HYPHEN-MINUS (a Pattern_Syntax character)
2. cp = U+200C ZERO WIDTH NON-JOINER (gc=Cf)
3. cp = U+200D ZERO WIDTH JOINER     (gc=Cf)

The listing is not quite complete yet, because my utility
only processed Planes 0, 1, 2, and 14, and there are also
noncharacter code points on the other planes. Also, I
think all user-defined characters must be given IDN_Never=True,
and I haven't done that yet.

But if you check this list, it should be clear in general
what kinds of characters constitute what I earlier
designated as the ones that *nobody* wants to include
in IDNs and for which a stability guarantee would be easy
to stand by.


More information about the Idna-update mailing list