Mapping and Variants

Tue Mar 10 06:18:38 CET 2009

+1 on Mark's message concerning confusability.
I also think that script mixing within a label should be a client
application decision, not dictated by protocol. For many scripts it is
in fact innocuous and desirable to be mixed with ASCII Latin (take
Japanese and Romaji for example). In my days at Microsoft, when helping
exposing IDN in IE7, we went from a fairly restrictive model to a much
more open model concerning script mixing, clearly banning the
problematic cases (such as Greek, Cyrillic, Latin mixing), but allowing
for example most of the Asian scripts to be mixed with Latin, and
obviously allowing the mixed script scenarios required for Japanese and
Korean.
Finally the script property as exposed by Unicode cannot be used without
some careful analysis to determine 'single' script. There are values
such as 'Common' and 'Inherited' which have to be allowed with most
other script values. At the same time, 'Common' is a value that often
means 'shared' by at least two scripts, and it does not mean that all
'Common' characters should be mixable with all scripts.

In other words, it is way too complicated to be enshrined in a protocol
where stability is a feature. It is better done by registry policies and
client application awareness. And it needs to be adjusted as new threats
emerge while respecting real need for multi-script labels when no harm
potential exists.

Michel