IDN and language

Bruce Lilly blilly at
Tue Jan 4 15:38:54 CET 2005

> Re: draft-phillips-langtags-08, process, specifications, "stability",  and extensions
>  Date: 2005-01-01 19:56
>  From: "Doug Ewell" <dewell at>
>  To: ietf-languages at
> Bruce Lilly <blilly at erols dot com> wrote:

> > Domain names and
> > language tags are different types of names, used for
> > different purposes, and with different scope (largely
> > non-overlapping, though one might legitimately ask how
> > one is supposed to determine the language of an
> > "internationalized" domain name...)
> One is not.  Domain names are strings of characters; only incidentally
> do they spell out one or more words in one or more languages.  I doubt
> whether the names "Google," "Yahoo," and "AltaVista" can be pinned down
> as belonging to one specific language.

I was referring specifically to internationalized domain names
(IDN, RFCs 3490, 3491, 3492, 3743) where the on-the-wire
domain name continues to be of traditional form (ANSI X3.4
letters,digits, and hyphen (with restrictions on combinations
and placement)), but where a certain class of names (those
beginning with "xn--") are "internationalized" and might be
presented to users in a different form (which can include
non-ASCII characters).  That came about because of the
tendency to associate a domain name (tag) with a natural
language "name" or legally-registered name (trademark, etc.).
Whether one considers such associations logical or
irrational, that is what has happened.  So one could have
a domain name (beginning with xn--) that is presented by
an application as "Nestlé.com".  Now certainly some names,
such as your examples, Kodak, Häagen-Dazs, etc. have no
language (because they are made-up strings of characters),
but others do have a specific language.  In skimming through
the RFCs mentioned above, it appears that there is now some
provision for language tagging (which was not present in
earlier versions of IDN).  However, I have not thoroughly
reviewed those recent additions; therefore it should be
clear that I have not reviewed the impact of the proposed
draft changes on IDN or vice versa.  Such a review should
take place (ideally before the deadline for the New Last
Call on draft-phillips-langtags-08 (tomorrow!)), but I'm
not the person to do so as I have only slight interest in
IDN (I'm one of those who considers associating a tag
with natural language and/or legally registered names to
be irrational).  One potential issue is that domain names
are case-insensitive, and whether lower-case accented
characters map to/compare with unaccented upper-case
letters may be a function of language (or culture, or
political fiat).

I would add that there is apparently some discussion of
wreaking similar havoc on local-parts, which appear in
message-identifiers and email mailbox identifiers (STD 11).
That too should be evaluated w.r.t. specification of
language and the proposed changes.

More information about the Ietf-languages mailing list