Concerns about the "szett" exception

Mon Oct 26 13:15:45 CET 2009

All,

As you probably know, i'm working for the Austrian Domain Name Registry
(nic.at). I've recently prepared a presentation to our board regarding
the changes to expect from IDNAbis deployment, and I've been asked by
our board to voice our concerns about the "szett" (U+00DF) exception in
the current document set. I understand that the documents have
progressed very far, and that we should have voiced our concerns earlier
- however, i think that the information below is still valuable to the
group.

Obviously, the DNS is an extremely important identity and naming system
that is crucial to the operation of nearly all internet applications.
Therefore, any changes to that structure are delicate operations. This
is important for the creation of new portions of namespace, but
particularly important when the semantics of a namespace (portion) are
changed. The introduction of IDNA2003 was an extension of the namespace,
at least from the application perspective (technically, it was changing
the definition of an awkward-enough portion of the namespace, namely
labels with "xn--"). 

Changing the semantics of a certain namespace is *really bad*, and i
agree to what Marcos said long time ago "Breaking backwards
compatibility is to my eyes the big stigma of IDNA2008".

I understand and welcome the introduction of rigid rules in IDNAbis as
the primary mechanism to identify copepoint classification and protocol
validity. Independence from a certain Unicode revision ensures a stable
specification, and should create few "surprises" (essentially, it shifts
responsibility of character classification from the IETF to Unicode). I
also understand and welcome the 1:1 relation on the protocol level
between A-label and U-label. 

However, the introduction of *exceptions* that work around those rigid
rules, and particularly changing the semantics of a part of a deployed,
used namespace is *really really bad* - particularly if the exception
concerns such a "weird" character as the "szett" (Unicode folding-wise).
Such changes generally  have the potential to change the resolved
destination for a certain domain name, which in turn creates *major*
security issues, and hurts interopability badly, because unlike the
introduction of IDN2003, where a label would either work or not, those
exceptions now create a situation where such a label would resolve to
either destination A (old application), destination B (new application).

I understand that the Rationale document proposes sensible approaches in
Section 7.2 - however, i think the security issues could discuss the
problems more explicitely, rather than just referring to the rationale
document (which is informational anyways). I think that the sentence

   "...a few characters that were mapped to others in the earlier
version;
   zone administrators should be aware of the problems that might raise
   and take appropriate measures"

In the definitions document could easily be overlooked by implementors. 

Another issue makes it even harder for zone administrator to deal with
the problem: Actually *encouraging* application developers to create
their own fancy mapping definitions, beyond the mappings that were
included in IDNA2003 allows for even more "variations", and are bound to
hurt interopability badly. One example of this is the Unicode TR46,
particularly the proposal of "dual lookups" and "trusted registries" for
"Deviations", which i believe to be a really really bad idea - but what
are the other options?

Shifting the responsibility of mapping, and therefore allowing for
creating a myriad of mapping options to application developers seems
risky to me, particularly for the Exception codepoints for which
protocol definitions have changed between the two versions. From my
point of view, it makes such codepoints unusable - the "mapping du jour"
of application X could be entirely different than that of application Y.

The Mapping draft says that it's "unusual" for the IETF to disucss user
input processing steps - but on the other hand, Section 2.1 of RFC 3761
(the ENUM base specification) clearly provides normative text about how
user input should be prepared for a protocol (and i'm sure there are
many other examples). So it seems the IETF *is* concerned about how user
input is mapped to protocol elements.

To sum up, we would have preferred the "szett" (U+00DF) to be kept
"DISALLOWED", and to have the IETF describe the mapping procedures not
just "Informational" (The contents of the mapping document itself is
perfectly fine). We also hope that the IETF liases with application
developers, particularly browser vendors, to establish one single "de
facto" mapping procedure, so that at least the szett does not become a
moving target.

Thanks,

Alex Mayrhofer