Parsing the issues and finding a middle ground -- another attempt

John C Klensin klensin at jck.com
Thu Feb 26 16:04:23 CET 2009


Hi.

Probably like some others in the WG, I've been lying awake
nights trying to figure out a way forward in this situation.
In the last few days, I've had the opportunity to talk with
several people who operate, or are close to, registries who
operate in parts of the world where they see IDNs as really
critical.  They want three things, not necessarily in this
order and sometimes stated in other ways:

(i) DNS-based identifiers that are absolutely unambiguous and
predictable, even to people who are not deeply familiar with
the script in question nor with the specifics of Unicode (or
other CCS) design decisions and the details of their
implementation.   To them, that translates into treating only
those things as equal that are visually and bit-string
identical.  From that point of view, any equivalence on any
other basis is an issue of semantics and bindings to be decided
locally and hence are  a matter for registry or registrant
action, informed by local policies that may differ with
different domains/zones.

(2) They want things to be as predictable (unsurprising to
users) as possible given the expectations of non-specialists
who read particular languages and the scripts in which they are
written, expectations that are also informed for some (but not
all) users by experience with the DNS.  In that regard, some
would go fully as far as Jefsey has suggested, with matching
rules dependent on language, locale, individual user
expectations, and context.  Those who understand the difference
between matching rules and mapping strings into other strings
in ways that lose information, and those who understand the
implausibility of localized mapping rules in a global DNS where
any valid label string can appear anywhere in the tree, know
how impossible that is, but it doesn't prevent wishing.  And
they certainly believe that the fact that some things are
impossible should not create a philosophical bias against
dealing with the more plausible cases.

(3) They want this settled.  For some that is more important
than what conclusions we reach: other things that are important
to them are stuck waiting for it, whether those things are
entangled with ICANN policy-making, with efforts to formulate
local policies that will be stable over time, with decisions
about what labels to permit that would be different with a
Unicode 5.1 based system than with a Unicode 3.2 one, or with
the development of marketing and related strategies.  Everyone
I talked with is willing to deal with some incompatibility,
-- even to what labels are considered valid and with different
interpretations in the two systems -- between IDNA2003 and
whatever-comes-now -- as long this is the last time there are
incompatible changes and as long as we don't drag this out
much longer ("too long already" is a popular comment).  Against
that backdrop, they want no more incompatibility with IDNA2003
than necessary, but they consider the first two goals much
more important than strict compatibility... and they understand
that some of the compatibility problems are theirs to solve...
as long as we give them the tools.

Most of them recognize the importance of both (1) and (2) and
understand that they are contradictory and require, at least,
balancing tradeoffs.  In that regard, they are doing better
than this WG sometimes does, in which people seem to be arguing
for one position or the other, treating the other one as
insignificant or irrelevant.   If I've contributed to that
style of discussion, I apologize: It was never my intent to do
so and I've seen the tradeoffs all along.

They also understand that trying to find the right balance is
hard and are willing to cut us some slack on schedules because
of it.   But they don't see much progress (other than going
around in circles) and that concerns them.   Those who have
been following our work also seem to have no patience at all
for our having procedural arguments as a substitute for
addressing the real questions.  I was asked more than once if
the IETF had gotten so paralyzed by this issue that it was time
to move it to a different forum (and I was told that ITU had
volunteered).

Where does this take us?  I tried to propose a "lower case
mappings only" model a few weeks ago, on the theory that it was
the one that was needed to simulate the matching behavior of
the DNS, to avoid a situation in which the addition of one
character to a string could change it from "case-insensitive
matching" to "case-sensitive matching and possibly invalid",
and because, in Unicode terms, it depended only on the
well-understood (although, as Jefsey has pointed out, not
universally accepted) lower-case procedure and not on the more
subtle and less-generally-understood case folding and
compatibility character relationships.  As far as I can tell,
the proposal died a swift but painful death, mostly on the
principle that, if the Latin/Greek/Cyrillic folks were going to
get lower case mapping, then there were all sorts of mappings
that others would like (or insist on).

So, in the context of the above and in the hope that it will
provide a foundation for moving forward, let me try out another
suggestion (necessarily less specific than the lower-case one;
there are details that would have to be sorted out here).

(i) We ban registration-side mapping in the protocol and
discourage any local mapping on that side.  There is really no
need for it and having a registrant be absolutely clear about
what is going into the DNS, how the native character form will
appear when converted from the A-label, etc., seems important.
It is also consistent with the current practices of a large
number of registries who handle IDNs (see Pat Kane's recent
note for an example of a specific procedure).  Based on my
understanding of discussions on the list, I modified the latest
versions of Protocol and Rationale reflect this restriction in
the posted versions: all of the local mapping text has been
removed and even the "get it into Unicode" text has been
eliminated.  Of course that could be changed back if the WG
reaches some other conclusion.

(ii) We make it clear (if it isn't already) that, in cases were
either changes in the  protocol or the nature of things (e.g.,
Traditional-Simplified Chinese relationships) creates a
situation in which perceived relationships among label strings
are important, it is the responsibility of the relevant
registry to cope by making a policy they consider appropriate,
enforcing it, and taking responsibility for it.   We can, and
have, suggested some alternatives, but, for reasons already
discussed on the list, should not try to go much further.

(iii) We tell folks on the lookup side that, if a label in
native-character form is invalid under IDNA2008 but valid under
IDNA2003, they SHOULD apply the IDNA2003 mappings and look the
thing up.  Note that this implies two tests but only one lookup
in the DNS.  I'm not happy about this suggestion for a long
list of reasons, but perhaps it gives a basis for moving
forward.  Note that this does not suggest revisiting Stringprep
and creating any new mappings.  And it clearly doesn't help
with the "changed interpretation" cases.

(iv) For the four "changed interpretation" cases, we make it
clear that the IDNA2008 interpretation is the important one and
that registries have a lot of responsibility here.   However,
if an application is in a position to deliver two different
answers to the user, then it MAY reasonably do both lookups and
then do whatever with them seems appropriate (obviously, a "did
you really mean?" dialogue would be one such option).

Does that help?

   john
 



More information about the Idna-update mailing list