baking into the protocol

Thu Dec 21 15:41:08 CET 2006

--On Thursday, 21 December, 2006 19:08 +0900 Martin Duerst
<duerst at it.aoyama.ac.jp> wrote:

> At 05:44 06/12/21, John C Klensin wrote:
> 
> [removing most of the post, because I generally agree]
> 
>> There is another problem with prohibiting script-mixing at the
>> protocol (IDNA) level and that is that the common,
>> on-the-street, perception of "the script we use" is different
>> from the Unicode definitions of "script".  No one is wrong
>> here, but, if JDNC concludes that Romanji is a necessity and
>> must be available in mixed names with Kanji and Kana, I don't
>> think we are in a position to say "no" (although we can
>> _advise_ that this isn't a good idea). 
> 
> And why _should_ we advice that it isn't a good idea?
> The confusion potential between Latin and Kanji/Kana is
> virtually nil.

I was not proposing that we do so, merely trying to identify the
contrast.

>> Similar examples arise with mixtures
>> of Cyrillic and Roman characters in Russia, even though we are
>> agreed that is one of the more dangerous cases of mixed-script
>> labels (the fact that some strings in Cyrillic can be confused
>> with names in Latin characters even when they are purely
>> Cyrillic is one of the arguments why prohibiting mixed scripts
>> isn't nearly as powerful a tool as is often argued).
> 
> Yes. The amount of danger comming from script mixtures depends
> extremely strongly on the scripts involved. That's why any kind
> of general solution, even in the form of a recommendation,
> is probably a bad idea.

We are in complete agreement, I think.   The only recommendation
I would recommend (sic) would be that registries study the
scripts that they permit to be mixed very carefully and make
policies that they believe reflect an appropriate balance for
their user and registrant populations.  I think we might
rationally couple that advice with a warning that characters
that might look very different to experienced users of the
scripts involved might look similar enough to be confused by
those who have less experience.

We do need to keep in mind that advice of that sort is likely to
be almost useless to most operators of TLDs with global scope.
Their problems with the implicit requirement of treating all
languages and scripts equitably is, as always, much more severe
than the problems faced by a ccTLD that can decide on support
for a relatively smaller number of scripts or writing systems.

That conclusion has two corollaries that I think we need to
understand.  The first is that trying to build a "no mixed
scripts in a label" rule into the protocol is probably an idea
that isn't going anywhere (and shouldn't).  The second is that
any rule in application software that treats mixed-script labels
as inherently dangerous should be extremely localized, probably
examining the user's preferred languages or scripts and the
particular combinations of scripts being mixed, lest one
generate warnings about many safe situations and thereby cause
users to discount warning information that really is important.

    john