standard, stable and unambiguous references Unicode

Erik van der Poel erikv at google.com
Sat Feb 9 03:32:05 CET 2008


Patrik and Mark,

I'm reading tables-04 now. I noticed a few things that could be
improved, in terms of standard, stable and unambiguous references to
Unicode. This is important since IDNA200X is supposed to evolve with
Unicode. We need to be able to generate the pvalid/disallowed/etc
table every time Unicode releases a new version. So here are a few
suggestions and questions:

Standard. IDNA200X should use the standard names of Unicode properties
and processes, and Unicode should try not to change those names. For
example, tables-04 refers to NFKC(...) while Unicode calls that
toNFKC(...):

http://www.unicode.org/reports/tr15/#Notation

There is another function called isNFKC(...), so it would be nice to
get the right one (toNFKC).

Stable. IDNA200X should use stable references to Unicode documents,
and Unicode should make sure those references keep working. For
example, the normalization spec mentioned above could be referenced
using the stable URI:

http://www.unicode.org/reports/tr15/

Unambiguous. IDNA200X should use unambiguous names, and Unicode should
offer them. For example, tables-04 refers to casefold(...). Unicode
has something called Case_Folding(c) that only applies to single
characters:

http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf

Unicode also has something called toCasefolding(x) for strings of
characters on page 125 of the above chapter, labelled R4. However, the
paragraph above that says that there is a simple and a full variant of
that. IDNA200X needs the string function (not the single character
function) in the "NFKC(casefold(NFKC(cp)) != cp" construct. I believe
IDNA200X also needs the full variant, not the simple variant. But
Unicode does not appear to have an unambiguous name for the full
variant of the function that works on strings. (Or, if R4 *is* the
full variant, the paragraph above it needs tweaking.) In the meantime,
IDNA200X can disambiguate it by explicitly saying that
toCasefolding(...) refers to the full variant.

Yes, this is just nit-picking, but at least we have gotten to the
point where we're just tweaking the IDNA200X drafts! We're nearly
done. :-)

Erik


More information about the Idna-update mailing list