WG Review: Internationalized Domain Names in Applications (Revised) (idnabis)

John C Klensin klensin at jck.com
Thu Apr 10 15:41:22 CEST 2008

--On Thursday, 10 April, 2008 19:16 +0900 Martin Duerst
<duerst at it.aoyama.ac.jp> wrote:

> Not speaking for Eric at all, but it took a very, very long
> time for the two sides on the debate to understand what was
> the best way forward. One side (the draft authors and others)
> was convinced that equivalence between simplified and
> traditional forms of Chinese characters in IDNs was essential,
> the other side (Ken, me, and some others) knew that it was
> impossible because the mapping was not 1-to-1 and way more
> complex and language dependent than case mapping.

And, of course, even had the many-many relationships not
existed, the "language dependency" you refer to was that
treating Simplified and Traditional Chinese as equivalent in the
protocol would presumably have "simplified" Japanese Kanji and
Korean Hanji, a disaster for users of those languages.
> The solution we arrived at in the end was bundling, but that
> may have taken 2 years or more.

That solution of identifying variants and treating them as a
group (those who have not been following IDN work closely should
have a look at RFC 3743 and perhaps at RFC 4290) was, IMO (and
from long before I got involved with the effort) a significant
conceptual breakthrough.   It included important insights into
ways in which registries can handle various sorts of character
similarities and confusion.  Prior to its development, there had
been discussions of registries permitting some characters and
character-combinations in labels and prohibiting others (see the
IESG Statement on IDNs and the early version of the ICANN
Guidelines for examples), but not of grouping or bundling labels.

But neither you nor Eric answered Harald's "what lessons did you
draw from it" question.

My list, FWIW,...

	(i) There are no "magic bullet" solutions in which
	simple specifications in the protocol or tables, or
	depending on registries, will address all issues and
	solve all problems with IDNs.  We need to have an
	approach in which we recognize that lists of permitted
	characters, protocol mechanisms, registry restrictions
	and behavior, applications implementers and
	implementations, and even user education and
	responsibility have important roles.  Those
	relationships should be recognized explicitly, we should
	make decisions carefully about what functions belong
	where, and we should consider "make the work of the
	other components easier" as an explicit design goal for
	each component.
	(ii) Mappings in the protocol are bad news. They
	increase vunerability to Unicode changes, increase the
	potential for user confusion about what is going on, and
	--most important in this particular context-- invite
	debates about why one should have one set of mappings
	and not another.
	(iii) IDNA must deal with scripts and, to a limited
	extent, labels, but not languages.  If we have a script
	that is used, in different ways, by more than one
	language, then any decisions or mechanisms in the
	protocol or supporting tables themselves must be
	compatible with all of the relevant languages or at
	least represent a good balance among them.  Anything
	else needs to be dealt with elsewhere.

None of these are completely new, but the experience of the last
several years, including the JET experience, has brought them
into much clearer focus and changed how I, at least, would rank
the tradeoffs.


More information about the Idna-update mailing list