Rule H (was: Re: New version,
draft-faltstrom-idnabis-tables-02.txt, available)
John C Klensin
klensin at jck.com
Wed Jun 13 00:01:34 CEST 2007
--On Tuesday, June 12, 2007 18:56 +0200 JFC Morfin
<jefsey at jefsey.com> wrote:
> At 17:31 12/06/2007, Paul Hoffman wrote:
>> At 3:53 PM +0200 6/12/07, JFC Morfin wrote:
>>> IDNA made a distinction between countries on the ASCII TLD
>>> basis.
>>
>> This is not true, and I believe you are quite aware that it
>> is not true.
>
> Dear Paul,
> May be my Franglish logic was confusing. Anyway, what is of
> concern today is the way Rule H is perceived when reading
> IDNAbis. A blunt list makes a difference from an acceptable
> description of the conditions for a script to be accepted,
> even if the resulting list is the same.
Jefsey,
Now we are getting to the point.
I intensely dislike having Rule H. I think that dislike is
shared by Patrik, Harald, Cary, Tina and others. I also don't
think we have so far explained it, and the reasons for it, very
well, and I'd appreciate the help of others in coming up with a
better explanation. But we have concluded, sadly and painfully,
that it is necessary, at least for the short term.
You (and others) may reasonably disagree, but please read either
or both of Harald's recent explanation or the one that follows
(I hope they are consistent and complementary) and then try to
help us with this rather than stirring up more FUD.
What we have discovered is that there is controversy about the
optimal (or even adequate) way to handle many scripts,
especially when the same script is used as all or part of the
writing system for different languages. Sometimes the problem
involves differences in opinion about presentation forms.
Sometimes one must know the language in order to sort out
presentation issues correctly (and the DNS does not not provide
for transmission of language information). Sometimes, although
they are primarily unusual edge cases, there are even questions
about whether the codings and rules present in Unicode 5.0 are
sufficient to handle some particular writing system adequately,
whether some of the characters of a script are associated with
the correct set of properties or not, and so on. In each case,
those uncertainties are opportunities for user confusion,
astonishment, or disappointment and sometimes for not being able
to write the words of some languages --should one want to use
those words in DNS labels -- in a consistent and correct fashion.
Incidentally, the part of the IDNA200x model that is most
different from the earlier one is another consequence of the
problem outlined above: Unicode provides compatibility mappings
and case mappings that reasonable but that may not be precisely
correct (or "as would be expected") in all cases. IDNA2003
applies those mappings as part of the protocol. IDNA200x
treats them as localization issues -- to be applied as desired
as part of the localization process, but not to appear "on the
wire" or as part of references to be used in interchange,
thereby lowering the risks of incorrect or unpredictable
behavior.
These distinctions are also important because we have discovered
case in which, in order to make it possible to express more than
a few words in the writing systems of some languages, characters
must be permitted that, while not problems in those particular
scripts (and usages more generally) would be problematic if used
in other contexts. So, for example, while IDNA2003 prohibited
zero-width breaking and non-breaking characters entirely, we now
have special rules that permits those characters only in
contexts in which they are helpful (or necessary) rather than
potentially harmful.
In practical terms, Rule H has to be understood in conjunction
with the implications of its categories for registration and
lookup. For lookup, the "permitted", "maybe yes", and "maybe
no" categories are all equivalent. A process looking up a
string need only verify that none of the prohibited characters
are present and then relies on the trust that strings that
should not have been registered will not be found.
By contrast, classification as "maybe yes" or "maybe not",
implies that entities registering strings should refrain from
registrations dependent on those scripts until they are
confident that issues associated with them are resolved or that
there are no issues.
Now, in practical terms, the IETF can't dictate policies to
registries or to ICANN (or to anyone else who thinks they have a
concern in this area) and (I hope) would not want to try. My
personal opinion is that these categories will work out as
follows on the registry side:
* A ccTLD will make its own decisions about what
scripts, used to write languages of importance in that
country, should be treated as being "permitted",
rather than "maybe yes". They will follow the
tables about other, less familiar, scripts or perhaps
not register those as all, regardless of where they fall
in the property table. I would hope that they --or
others in their countries-- would participate in the
effort needed to move their scripts (and produce
script-specific rules as needed) into the
"permitted" category, but, if they don't care
about the use of the script elsewhere in the world or in
gTLDs enough to do that, perhaps no one should care
about their opinions.
I note that, while our understanding has improved in the
last three or four years, we have always known that safe
and successful deployment of IDNs was going to be easier
for a ccTLD that could make "this script, or its use
by that language, is more important than those other
ones" decisions than for gTLDs that are presumably
required to be equally fair to everyone.
* gTLDs will be held to registrations from the
"permitted" category only and will hence be strongly
motivated to work with appropriate language authorities
to come up with definitions that work well globally.
But those are just my personal opinions. The more important
thing is that we figure out, together, how to make this system
work well --as a foundation for globally-accessible and usable
references -- for the Internet.
Your notes, and those of several others on this list and
otherwise have raised another issue having to do with the
relationship of "language" to all of this. I'll address that in
a separate note.
john
More information about the Idna-update
mailing list