IDNNever.txt

Kenneth Whistler kenw at sybase.com
Sat Feb 3 03:37:36 CET 2007


Since the issue came up today, I have gone ahead and drafted
what a property file for an IDN_Never property would look
like, with a representative (and conservative) first cut
at its content.

See:

http://www.unicode.org/~whistler/IDNNever.txt

That has the same format as:

http://www.unicode.org/~whistler/IDNPermitted.txt

which I explained earlier.

For conservative criteria for what to absolutely, positively
guarantee are in the never, never, ever category, I have
started with:

1. cp != NFKC(cp)
2. cp has Pattern_Syntax property
3. cp has Pattern_White_Space property
4. cp has White_Space property
5. cp has Variation_Selector property
6. cp has Noncharacter_Code_Point property
7. cp has General_Category=Cf (Unicode format controls)
8. cp has General_Category=Cc (ISO controls)

(There is considerable overlap for some of those properties,
so not all of them may be required -- some may be redundant
for the purposes of this derivation. I just haven't done
the detailed analysis on this first cut yet.)

Then the following three exceptions are pulled from the
list:

1. cp = U+002D HYPHEN-MINUS (a Pattern_Syntax character)
2. cp = U+200C ZERO WIDTH NON-JOINER (gc=Cf)
3. cp = U+200D ZERO WIDTH JOINER     (gc=Cf)

The listing is not quite complete yet, because my utility
only processed Planes 0, 1, 2, and 14, and there are also
noncharacter code points on the other planes. Also, I
think all user-defined characters must be given IDN_Never=True,
and I haven't done that yet.

But if you check this list, it should be clear in general
what kinds of characters constitute what I earlier
designated as the ones that *nobody* wants to include
in IDNs and for which a stability guarantee would be easy
to stand by.

--Ken




More information about the Idna-update mailing list