Mixing scripts (Re: Unicode versions (Re: Criteria for exceptional characters))

John C Klensin klensin at jck.com
Fri Dec 22 13:59:04 CET 2006


Can I please ask that those participating in this discussion
take a few hours to remind themselves that concepts such as
script-mixing occur in context with the DNS and its operational
design and constraints as well as with the specifics of IDNA.
In particular, we need to all remember that the DNS is an
administrative hierarchy with a good deal of inherent (and
well-designed) independence at each hierarchical level (at least
below the second).   This note is intended partially as a
self-text: If any of the terminology or concepts below are
unclear, it should probably be taken as a warning sign that one
does not know nearly well enough what one is talking about.

As one example, trying to apply a "no mixed script" rule, at the
protocol level, across all of the labels of an FQDN is simply
infeasible, even if it were wise.  To do so would require rather
fundamental changes in the way DNS name-resolution works,
including script recognition in the resolvers and during DNAME
synthesis, which was exactly the problem IDNA is intended to
avoid.  Even to impose such a rule in registrations would
require that registries impose specific naming rules on
subdomains and require that those subdomains propagate those
rules to _their_ subdomains and that there would be a practical
way to enforce such rules. 

Even were such a rule possible, it would violate the basic
concept of administrative hierarchy, requiring that a global
enterprise maintain top-level DNS trees for each language group
in which they want to do business.  A company engaged in
typographic design could not, for example, maintain subdomains
that reflected relevant scripts within a single domain tree
structure.

It would also require a top-level domain, and a relatively
complete set of second-level domains, for each script, writing
system, or language (whatever one is trying to prohibit from
being mixed).  That domain tree would presumably need to be
administered by a recognized consensus authority on the script.
For most scripts, there is no such authority... unless, of
course, administration of all TLDs (or all non-ASCII TLDs) were
to be turned over to the Unicode Consortium, which sometimes
appears to claim authoritative expertise in all scripts.

On can theorize all one likes about some faceted name system
that might superficially resemble the DNS.  But pretending that
it is the DNS, or ignoring the real DNS in its favor, doesn't
get us anywhere except into more confusion.

This is, of course, not the only example.  We see practical
examples of script mixing all over the place, in a world that
has become increasingly homogenized.   One can debate whether
the cultural implications of making a company name by prefixing
"e-" or "e" (from "e-commerce") to a string in a non-Western
alphabet, but it happens today and there is at least as much of
a case to be made for permitting those company names in the DNS
as there is for any other string that is not linguistically and
orthographically a "word" in some language.   The DNS, and even
the IDNA protocol, are not the place to try to resolve those
cross-cultural issues: trying to do so leads to madness and to
even more delays in getting IDNs deployed in practical terms.
A registry might still decide to prohibit such things, or it
might not, but, given independence of administrative hierarchies
as well as the practicalities and differences of real-world
situations, we simply won't see uniformity of those policies
throughout the DNS tree.

If a rule against script-mixing would really accomplish a great
deal in terms of preventing spoofing, the boundaries implied
above might be worth reexamination, even though making changes
would have very high costs.  But, as long as one can, e.g.,
write strings entirely in Cyrillic that look like obvious and
common ASCII strings -- consider determining whether "com" is a
string in Latin or Cyrillic characters by visual inspection of
those three characters alone-- rules against script-mixing are
only an incomplete solution to the more general problem of users
tending to see what they expect to see when presented with a
string of glyphs and being subject to some deceptions as a
result.

best holiday and new year's wishes to all.
    john






More information about the Idna-update mailing list