The "exceptions" discussion

Fri Jun 22 22:28:21 CEST 2007

There are two ways to look at the discussion of exceptions.
One is that we can derive a set of rules and tables from
existing properties and then notice that, because writing
systems are messy and inconsistent (and that, through no one's
fault, Unicode sometimes is too), we need an exception list.
The other is that the need for an exception list is an
indication that "derive... from existing properties" is the
wrong model.  That view would say that there are some
idiosyncratic characteristics of the characters that are
appropriate for IDN use that don't fall out from existing
Unicode properties simply because those properties were
designed for different purposes and around different
characteristics.  It is worth noting that some variation on the
theme of "characters appropriate for Internet host and domain
names" has been with us since hostnames were first defined:
stated in property terminology, we have members of the ASCII
"letter" class, members of the ASCII "digit" class (originally
with a contextual/positional rule), and an exception with
hyphen -- either an exception in spite of the fact that it is a
symbol or an exception to its not being either a letter or a
digit.  As with many other things with IDNs, i18n doesn't
introduce new issues, it just makes old ones more dramatic or
clear.

For most purposes, the difference between "exception list" and
"new property" is conceptual rather than practical. To the
extent that is true, we should go with the former just because
it is the path of least resistance.  But there is a real issue
hidden behind the conversation between Patrik and Michael about
"hope".  That is that, in practice, IDN implementations will
often have no idea what version of Unicode they are running on
(see discussion in RFC 4690) so we need to be _very_ robust
against Unicode version changes that might create problems.
Ken's review of past changes and projection of future ones and
his and Mark's model for two separate "always" and "never"
properties are source of considerable comfort in that regard
(at least for me), but, after the breakdown of the previous
IETF-UTC agreement, even that work will inevitably be received
with some suspicion.  A new property that can be implemented in
libraries in a way that will return "not defined here" in a
clear way may have some slight advantages in that regard, even
if most of those advantages are psychological.

I don't have any suggestions here, but I think it is useful if
we all try to understand the issue.

Part of this, however, is the question of how things get onto
the "Always" list and what that means.  I'll address that in a
separate note.

    john