Languages, words, names and the DNS (was: Re: New version, draft-faltstrom-idnabis-tables-02.txt, available)

John C Klensin klensin at jck.com
Wed Jun 13 00:08:28 CEST 2007


Hi.

We have a long history, in talking about the DNS, of using terms 
such as "DNS Name" or "Fully-qualified Domain Name" to refer to 
DNS labels and complete rooted strings.  These are terms of 
convenience, but they can also cause a great deal of confusion 
when used carelessly, out of context, or without an 
understanding of the DNS.   All of these possibilities get much 
worse when one moves outside the DNS environment assumed by the 
basic DNS RFCs and the applications that implement it (and often 
impose their own syntax rules) and begins to talk about IDNs 
and, with IDNs, notions of languages, words, and specific 
scripts.

So, to review both the problem and the opportunities...

We use the term "name" or "domain name" in conjunction with the 
DNS because such a string identifies or names something.   There 
is no implication of a language, or words in that language, 
anywhere in the system, even in the traditional ASCII-only, 
letter-digit-hyphen-only, uses of the DNS.  Traditional labels 
are not in English (or "English words"), they are just strings 
of ASCII characters that conform to the LDH rule.  Labels that 
could not possibly be English words are quite common.

This has many important implications for IDNs, including:

(i) While an important goal for IDNA and the IDNAbis effort has 
been to permit many words of most languages to be registered and 
resolved as DNS labels should someone want to do so, there has 
never been a requirement that labels be restricted to strings 
that are orthographically correct in any language.  The common 
restrictions on mixed-script labels are not inherent in either 
IDNA or any revision proposal that has been presented so far: it 
is merely one place at which it appears that the tradeoff 
between risk to users and the reasonable requirements of those 
registering strings can be easily resolved.

(ii) IDNA has no mechanism for carrying information about the 
language associated with a particular label.  This is a good 
thing because the "language" associated with "bcd56fg" or, for 
that matter, "бвг56жз" (apologies to those who can't 
receive UTF-8), is, at best, rather arbitrary.   As long as we 
can all remember that these are just strings in particular 
scripts (which we can usually, but not always) identify, that 
isn't a problem.  But, if we need, or think we need, language 
information to compare or present domain label strings properly, 
we are in trouble because we can only guess at that information 
and hope that context helps us out.

To be specific about this, if we have to distinguish between 
English and French, between Arabic and Urdu, between Kanji and 
Traditional Chinese, and so on, in order to make things appear 
to the user in a way that is culturally reasonable, we are in 
trouble if we expect the DNS and IDNA to help.  They cannot and 
never will be able to help.  The "maybe yes" list and Rule H 
(see previous note) are, from this point of view, needed 
precisely to be sure that appropriate balances have been reached 
among the many writing systems that may use the same script as 
well as among similar-looking characters in different scripts 
(there are other reasons too, see that other note).

This is the point at which it may also be useful to remind 
people that, while the goal of the IDNA effort is to make the 
DNS as useful as possible for expressing and remembering 
identifiers in a variety of cultural and linguistic 
environments, the DNS cannot meet all of the Internet's naming, 
identification, and navigational needs.   In particular, as 
suggested above and elsewhere, if one really needs linguistic or 
cultural information to make good use of a name, than that name 
should be handled in a way that carries that information with it 
and makes it available and reliable.  We would all benefit from 
understanding where the boundaries of usability, functionality, 
and information requirements between the DNS and these other 
systems (present or potential) are and then designing solutions 
that match them appropriately... solutions that, in many cases, 
will utilize the DNS but not be part of it.

     john



More information about the Idna-update mailing list