WG Review: Internationalized Domain Names in
Applications (Revised) (idnabis)
John C Klensin
klensin at jck.com
Thu Apr 10 15:41:22 CEST 2008
--On Thursday, 10 April, 2008 19:16 +0900 Martin Duerst
<duerst at it.aoyama.ac.jp> wrote:
> Not speaking for Eric at all, but it took a very, very long
> time for the two sides on the debate to understand what was
> the best way forward. One side (the draft authors and others)
> was convinced that equivalence between simplified and
> traditional forms of Chinese characters in IDNs was essential,
> the other side (Ken, me, and some others) knew that it was
> impossible because the mapping was not 1-to-1 and way more
> complex and language dependent than case mapping.
And, of course, even had the many-many relationships not
existed, the "language dependency" you refer to was that
treating Simplified and Traditional Chinese as equivalent in the
protocol would presumably have "simplified" Japanese Kanji and
Korean Hanji, a disaster for users of those languages.
> The solution we arrived at in the end was bundling, but that
> may have taken 2 years or more.
That solution of identifying variants and treating them as a
group (those who have not been following IDN work closely should
have a look at RFC 3743 and perhaps at RFC 4290) was, IMO (and
from long before I got involved with the effort) a significant
conceptual breakthrough. It included important insights into
ways in which registries can handle various sorts of character
similarities and confusion. Prior to its development, there had
been discussions of registries permitting some characters and
character-combinations in labels and prohibiting others (see the
IESG Statement on IDNs and the early version of the ICANN
Guidelines for examples), but not of grouping or bundling labels.
But neither you nor Eric answered Harald's "what lessons did you
draw from it" question.
My list, FWIW,...
(i) There are no "magic bullet" solutions in which
simple specifications in the protocol or tables, or
depending on registries, will address all issues and
solve all problems with IDNs. We need to have an
approach in which we recognize that lists of permitted
characters, protocol mechanisms, registry restrictions
and behavior, applications implementers and
implementations, and even user education and
responsibility have important roles. Those
relationships should be recognized explicitly, we should
make decisions carefully about what functions belong
where, and we should consider "make the work of the
other components easier" as an explicit design goal for
each component.
(ii) Mappings in the protocol are bad news. They
increase vunerability to Unicode changes, increase the
potential for user confusion about what is going on, and
--most important in this particular context-- invite
debates about why one should have one set of mappings
and not another.
(iii) IDNA must deal with scripts and, to a limited
extent, labels, but not languages. If we have a script
that is used, in different ways, by more than one
language, then any decisions or mechanisms in the
protocol or supporting tables themselves must be
compatible with all of the relevant languages or at
least represent a good balance among them. Anything
else needs to be dealt with elsewhere.
None of these are completely new, but the experience of the last
several years, including the JET experience, has brought them
into much clearer focus and changed how I, at least, would rank
the tradeoffs.
john
More information about the Idna-update
mailing list