IDN_Never and IDN_Always

Kenneth Whistler kenw at sybase.com
Sat Dec 22 02:36:05 CET 2007


Patrik,

To perhaps allay some of the concerns you raised on the 16th
about property stability issues, I have run the table
derivation suggested in the contribution "Table Derivation"
posted earlier today by Mark and myself, against the
public data files posted for all versions of Unicode
from Version 3.2 to Unicode 5.0 (and the beta data files
currently public for the as-yet-unreleased Unicode 5.1),
and pushed up the set of resulting derived property definition
files to my public directory on the Unicode server:

http://www.unicode.org/~whistler/idna/IDN_Always-3.2.0.txt
http://www.unicode.org/~whistler/idna/IDN_Always-4.0.0.txt
http://www.unicode.org/~whistler/idna/IDN_Always-4.0.1.txt
http://www.unicode.org/~whistler/idna/IDN_Always-4.1.0.txt
http://www.unicode.org/~whistler/idna/IDN_Always-5.0.0.txt
http://www.unicode.org/~whistler/idna/IDN_Always-5.1.0.txt

http://www.unicode.org/~whistler/idna/IDN_Never-3.2.0.txt
http://www.unicode.org/~whistler/idna/IDN_Never-4.0.0.txt
http://www.unicode.org/~whistler/idna/IDN_Never-4.0.1.txt
http://www.unicode.org/~whistler/idna/IDN_Never-4.1.0.txt
http://www.unicode.org/~whistler/idna/IDN_Never-5.0.0.txt
http://www.unicode.org/~whistler/idna/IDN_Never-5.1.0.txt

Those are all diffable plain text files, in the same general
format as used for other Unicode property files.

If you examine them carefully, you will discover that they
exhibit the very kind of cross-version stability that you
and others on the idna-update list have been concerned
about, namely:

  1. Once a character is classed as NEVER, it does not
     change that status in any later version of Unicode.
     
  2. Once a character is classed as ALWAYS, it does not
     change that status in any later version of Unicode.
     
To obtain this result for updating from Unicode 5.0
to Unicode 5.1, nothing at all is required in the way
of adjusting what Mark and I suggested for the table
derivation.

To make this work *retroactively* from Unicode 5.0 all
the way back to Unicode 3.2, it is necessary to add a
small number of characters to the exception_NEVER_list
that Mark and I discussed, because of a few casing changes
that happened as of Unicode 4.0 and a few other stray
category changes. That list is very small, impacts
unimportant characters only -- and I'd be glad to share
the exact details if you are interested.

The point, however, is that starting with a carefully
planned out table definition from the beginning, not only
is it possible to maintain complete backwards compatibility
for the NEVER and ALWAYS classes for IDN from
Unicode 5.0 and moving forward -- one can even set things
up so as to guarantee retroactive backwards compatibility
for those classes all the way back to Unicode 3.2.

While we may obviously still want to discuss the details
and may differ in our opinions about the exact list of
scripts that belong in the Historic_Scripts category
or the exact small list of characters that should
(like MIDDLE DOT) be in the exception_ALWAYS_list, and
so forth, I think that the above posted files should serve
as an existence proof that it is possible to define the
derivation of this table in such a way as to guarantee
backwards compatibility of the property used by IDNA between
versions of Unicode.

Regards,

--Ken



More information about the Idna-update mailing list