idna-bis and Eszett
John C Klensin
klensin at jck.com
Tue Nov 27 11:36:08 CET 2007
(I've changed the subject line because the Sharp S / Eszett
character has been mangled well beyond recognition as it has
moved back and forth among mail systems -- something that should
be a warning to all of us.)
--On Tuesday, 27 November, 2007 13:04 +0900 Martin Duerst
<duerst at it.aoyama.ac.jp> wrote:
>>> Yes, that is a problem with the IRI spec.
> What I'm surprised is the lack of understanding and
> responsibility when proposing making potentially wide-reaching
> changes to a spec. What idnabis does is to change the rules
> for non-ASCII domain names. Up to now, a sharp s in a domain
> name was mapped to 'ss'. With idnabis, such a sharp s is
> simply 'a user interface issue'.
I'm not inclined to worry about whether this is a "problem with
the IRI spec" or elsewhere. But we clearly have a problem, or
perhaps several interconnected ones. I believe that the
following statements are all true:
(1) By using NFKC at both registration and lookup time, IDNA2003
permits a large number of mappings to occur. If end users have
become dependent on those mappings for export and interchange,
they are, to a greater or lesser extent, in trouble.
(2) In addition to the NFKC mappings, there are a few mappings
that are moderately to IDNA, including the one of Eszett. They
raise all of the issues of (1), but have the further
disadvantage (for compatibility purposes) of being fairly easy
to type on keyboards designed for the countries/ languages that
use those characters (the characters mapped out by NFKC are
typically harder to type).
(3) These mappings have been a source of user confusion and some
confusion in systems trying to use IDNs are if they were
ordinary domain names. For example, one cannot get a character
that is mapped out back from a reverse lookup. While that
raises no issues if, as specified, one compares only the ACE
forms, users who attempts a visual comparison will be in more or
less trouble, depending on how much they understand the writing
(4) The larger registry operators who are handling IDNs are
increasingly refusing to accept registrations in raw form,
permitting only the ACE form or ToUnicode(ToASCII(string)) to be
registered. As far as they are concerned, there is no such
thing as label containing Eszett, only labels containing the
(5) There have been some user complaints and confusion about
IDN mapping to and from the ACE form losing information.
(6) There have been some complaints that Eszett cannot be
actually stored in an IDN, i.e., preserved in conversions to and
from the ACE form.
(7) It appears that the standard orthographic rules about
whether it is appropriate or desirable to replace Eszett with
"ss" vary among German-speaking countries, so there is less
guidance from common practice than might appear at first glance.
> Independent of whether mapping in idna2003 was a good idea
> or not, what the above change does is to just leave some
> domain names foat in the air.
Well, it is not "independent", because the mappings have turned
out to be problematic. And, strictly speaking, no domain names
are up in the air, only external presentation forms of domain
names. But, semantics technicalities aside, this situation
represents a real and significant tradeoff and nothing is cast
in stone. If it is better to map Eszett -> "ss" in the
protocol than to reject it (at the protocol level) that can
certainly be done. In making that decision, we do need to
understand that there is a slippery slope between the mappings
of (2) about and those of (1) and between either of those an
alternate label separators, which actually introduced a
conceptual bug into IDNA.
They are calling my plane; more later.
More information about the Idna-update