Consensus Call Tranche 8 Summary - Addendum

John C Klensin klensin at jck.com
Thu Oct 23 19:26:39 CEST 2008



--On Wednesday, 22 October, 2008 19:50 +0200 JFC Morfin
<jefsey at jefsey.com> wrote:

> At 19:01 22/10/2008, John C Klensin wrote:
>> But I don't think getting those recommendations tied up with
>> the protocol (or the IDNA Standard more generally) is wise
>> for a whole series of reasons, including the ones that I gave
> 
> If I understand your response, this is not what I suggested. I
> just said that we should help as much as it is possible in
> making the thinking  globally and locally similar. Even if
> HISTORIC is empty (it can be decided as such at protocol
> level) it will help protocol consistency to define it at
> protocol level, so it can be more easily understood and
> correctly used at zone level.

I think you didn't understand what I was saying, or perhaps I
was not explicit about part of it as I should have been.

The larger and more diverse a target community is, the harder it
becomes to make a change if one gets something wrong or if
global circumstances or average conditions do not align with
local ones.  That observation has at least two corollaries in
our situation, one   about Unicode and one about IDNs.  

	* Whether or not it was wise to use CaseFold in IDNs,
	that operation necessary transformed Eszett into "ss"
	because Eszett was presumed to not have an upper case
	equivalent.  Now that Unicode has been extended to
	recognize an upper-case Eszett, one has to assume that,
	if the CaseFold operation were being defined today, at
	least serious consideration would need to be given to
	having toUpperCase(U+00DF) -> U+1E9E,
	toLowerCase(U+1E9E)->U+00DF, and (presumably
	necessarily) 
	   toCaseFold(U+00DF) = toCaseFold(U+1E9E) = U+00DF
	But, because the CaseFold operations are used globally
	and in many applications, trying to change things to
	work that way now is, at best, an incredibly complex
	problem.  At worst, it is unthinkable. 
	
	* One of the things we have run across repeatedly, and
	on which you have commented, is that there are several
	scripts that are used to write different languages and
	sometimes used to write them in slightly different ways.
	One could largely eliminate the problem by defining
	"script" differently and doing no unification (even
	between, e.g.,
	Latin-as-used-in-French-as-written-in-France and
	Latin-as-used-in-French-as-written-in-Canada, but that
	would cause a host of other problems. In Unicode, the
	consequences of those relationships include the
	requirement for rendering engines that are more or less
	language-specific, but, because we don't have language
	information available for isolated IDNs and because
	there is no requirement that IDN labels be well-formed
	words, depending on rendering engines does not work as
	well as it does for bodies of general text.  For many of
	those issues, there may be quite satisfactory local
	solutions that make assumptions based on the local
	environment.  But, if we try to make a global
	requirement or condition, we just make a mess.

This is one of the key reasons why, with IDNA2008, we have been
trying to keep the protocol as limited as possible (e.g., no
mappings) and to impose rules about specific characters only if
not doing so would cause harm, rather than disallowing
characters on the basis of preference or simple confusability.
We are allowing archaic scripts, not because we expect a pocket
of users of one those scripts to suddenly emerge from hiding and
invade Europe, but because their characters are clearly
"letters" and there is no evidence that allowing them is harmful
(that distinction is the reason I don't believe that the
position "if we disallow Jamo, we have to reopen the question of
historic scripts" is a meaningful relationship).

Having that principle doesn't mean that it is not controversial:
reasonable people may disagree about what is harmful (or harmful
enough) to justify disallowing in a DNS context just as they can
disagree about what is sufficiently archaic to be inappropriate
for registration in domains that are expected to deal with only
modern languages.

That, in turn, is why I personally consider "don't register
labels in any script with which you are not familiar or that is
not actively used in the community and environment you are
trying to serve" to be much better guidance for registries than
anything that is related to abstractions like "historic".  Thai
may be as much of a problem in some parts of the world as
Classic Egyptian Hieroglyphics would be, perhaps even more so.
That is not a good reason to deprecate either at a protocol
level, although it suggests that different registries might well
adopt different rules.

Having said that, I must confess some nervousness about at least
some of the "historic" scripts (specifically those for which
complete decodings are unknown or uncertain).  But it isn't
nervousness at the level we have been discussing, it is
nervousness that we don't know quite enough about some of them
and that a discovery that could be made in the future might
change whatever we believe today about correct relationships
among characters, what is a character and what is not,
normalization rules, and so on.  I have never liked making
decisions that cannot be undone unless I'm confident that full
and accurate information is available.  I also recognize that
doing so is sometimes necessary, but I prefer to minimize it.

Others may believe that a "don't use historic scripts" rule is
actually more useful guidance than some variation on "don't
register what you don't know".  But one of the other advantages
of keeping that issue separate from from the protocol ones is
that we don't have to try to settle that disagreement or
difference in perspective.  Both groups would, I hope, agree
about the lack of harm and there, from a protocol (and table
rules) standpoint is where, IMO, the WG should stop.

regards,
    john





More information about the Idna-update mailing list