Languages, words, names and the DNS (was: Re: New
version, draft-faltstrom-idnabis-tables-02.txt, available)
John C Klensin
klensin at jck.com
Wed Jun 13 00:08:28 CEST 2007
Hi.
We have a long history, in talking about the DNS, of using terms
such as "DNS Name" or "Fully-qualified Domain Name" to refer to
DNS labels and complete rooted strings. These are terms of
convenience, but they can also cause a great deal of confusion
when used carelessly, out of context, or without an
understanding of the DNS. All of these possibilities get much
worse when one moves outside the DNS environment assumed by the
basic DNS RFCs and the applications that implement it (and often
impose their own syntax rules) and begins to talk about IDNs
and, with IDNs, notions of languages, words, and specific
scripts.
So, to review both the problem and the opportunities...
We use the term "name" or "domain name" in conjunction with the
DNS because such a string identifies or names something. There
is no implication of a language, or words in that language,
anywhere in the system, even in the traditional ASCII-only,
letter-digit-hyphen-only, uses of the DNS. Traditional labels
are not in English (or "English words"), they are just strings
of ASCII characters that conform to the LDH rule. Labels that
could not possibly be English words are quite common.
This has many important implications for IDNs, including:
(i) While an important goal for IDNA and the IDNAbis effort has
been to permit many words of most languages to be registered and
resolved as DNS labels should someone want to do so, there has
never been a requirement that labels be restricted to strings
that are orthographically correct in any language. The common
restrictions on mixed-script labels are not inherent in either
IDNA or any revision proposal that has been presented so far: it
is merely one place at which it appears that the tradeoff
between risk to users and the reasonable requirements of those
registering strings can be easily resolved.
(ii) IDNA has no mechanism for carrying information about the
language associated with a particular label. This is a good
thing because the "language" associated with "bcd56fg" or, for
that matter, "бвг56жз" (apologies to those who can't
receive UTF-8), is, at best, rather arbitrary. As long as we
can all remember that these are just strings in particular
scripts (which we can usually, but not always) identify, that
isn't a problem. But, if we need, or think we need, language
information to compare or present domain label strings properly,
we are in trouble because we can only guess at that information
and hope that context helps us out.
To be specific about this, if we have to distinguish between
English and French, between Arabic and Urdu, between Kanji and
Traditional Chinese, and so on, in order to make things appear
to the user in a way that is culturally reasonable, we are in
trouble if we expect the DNS and IDNA to help. They cannot and
never will be able to help. The "maybe yes" list and Rule H
(see previous note) are, from this point of view, needed
precisely to be sure that appropriate balances have been reached
among the many writing systems that may use the same script as
well as among similar-looking characters in different scripts
(there are other reasons too, see that other note).
This is the point at which it may also be useful to remind
people that, while the goal of the IDNA effort is to make the
DNS as useful as possible for expressing and remembering
identifiers in a variety of cultural and linguistic
environments, the DNS cannot meet all of the Internet's naming,
identification, and navigational needs. In particular, as
suggested above and elsewhere, if one really needs linguistic or
cultural information to make good use of a name, than that name
should be handled in a way that carries that information with it
and makes it available and reliable. We would all benefit from
understanding where the boundaries of usability, functionality,
and information requirements between the DNS and these other
systems (present or potential) are and then designing solutions
that match them appropriately... solutions that, in many cases,
will utilize the DNS but not be part of it.
john
More information about the Idna-update
mailing list