Parsing the issues and finding a middle ground -- another attempt

Patrik Fältström patrik at frobbit.se
Thu Feb 26 23:39:40 CET 2009


On 26 feb 2009, at 16.04, John C Klensin wrote:

> Hi.
>
> Probably like some others in the WG, I've been lying awake
> nights trying to figure out a way forward in this situation.
> In the last few days, I've had the opportunity to talk with
> several people who operate, or are close to, registries who
> operate in parts of the world where they see IDNs as really
> critical.  They want three things, not necessarily in this
> order and sometimes stated in other ways:
>
> (i) DNS-based identifiers that are absolutely unambiguous and
> predictable, even to people who are not deeply familiar with
> the script in question nor with the specifics of Unicode (or
> other CCS) design decisions and the details of their
> implementation.   To them, that translates into treating only
> those things as equal that are visually and bit-string
> identical.  From that point of view, any equivalence on any
> other basis is an issue of semantics and bindings to be decided
> locally and hence are  a matter for registry or registrant
> action, informed by local policies that may differ with
> different domains/zones.
>
> (2) They want things to be as predictable (unsurprising to
> users) as possible given the expectations of non-specialists
> who read particular languages and the scripts in which they are
> written, expectations that are also informed for some (but not
> all) users by experience with the DNS.  In that regard, some
> would go fully as far as Jefsey has suggested, with matching
> rules dependent on language, locale, individual user
> expectations, and context.  Those who understand the difference
> between matching rules and mapping strings into other strings
> in ways that lose information, and those who understand the
> implausibility of localized mapping rules in a global DNS where
> any valid label string can appear anywhere in the tree, know
> how impossible that is, but it doesn't prevent wishing.

This last sentence above is key. There IS a difference between:

- Matching two strings in a specific context, and deciding using the  
rules of that context whether the strings are "the same" or not. Note  
the quotes.

- Converting two strings to some neutral string, using context  
dependent rules, and then comparing the neutral strings and saying  
whether they are "the same" or not. This where the comparison is made  
without any contextual rules at all, and with an acceptance that the  
transformation could be with loss. I.e. the conversion is not lossless.

- Converting two strings to some neutral string without context  
dependent rules, and then comparing as in the previous case.

Now, IDNA2003 was doing the last of these three. On top of that, it  
did not have a terminology that differentiated between the codepoints  
that where mapped, and then stored in DNS, from the ones that could be  
stored in the DNS. The difference between these two categories are of  
course that the ones that can be stored can be mapped back again,  
while the first category can not.

IDNA2008 is very strict, and more the second of the three. The mapping  
rules are not written, but can be described. We have suggested they  
have to be defined by registries, because based on that we will get  
context dependent rules that in reality is based on a combination of  
language and TLD (whatever that means).

The first is to some degree what people THINK can be done, but, with a  
global system like DNS, "just foggetaboutit".

> And
> they certainly believe that the fact that some things are
> impossible should not create a philosophical bias against
> dealing with the more plausible cases.
>
> (3) They want this settled.  For some that is more important
> than what conclusions we reach: other things that are important
> to them are stuck waiting for it, whether those things are
> entangled with ICANN policy-making, with efforts to formulate
> local policies that will be stable over time, with decisions
> about what labels to permit that would be different with a
> Unicode 5.1 based system than with a Unicode 3.2 one, or with
> the development of marketing and related strategies.  Everyone
> I talked with is willing to deal with some incompatibility,
> -- even to what labels are considered valid and with different
> interpretations in the two systems -- between IDNA2003 and
> whatever-comes-now -- as long this is the last time there are
> incompatible changes and as long as we don't drag this out
> much longer ("too long already" is a popular comment).  Against
> that backdrop, they want no more incompatibility with IDNA2003
> than necessary, but they consider the first two goals much
> more important than strict compatibility... and they understand
> that some of the compatibility problems are theirs to solve...
> as long as we give them the tools.
>
> Most of them recognize the importance of both (1) and (2) and
> understand that they are contradictory and require, at least,
> balancing tradeoffs.  In that regard, they are doing better
> than this WG sometimes does, in which people seem to be arguing
> for one position or the other, treating the other one as
> insignificant or irrelevant.   If I've contributed to that
> style of discussion, I apologize: It was never my intent to do
> so and I've seen the tradeoffs all along.
>
> They also understand that trying to find the right balance is
> hard and are willing to cut us some slack on schedules because
> of it.   But they don't see much progress (other than going
> around in circles) and that concerns them.   Those who have
> been following our work also seem to have no patience at all
> for our having procedural arguments as a substitute for
> addressing the real questions.  I was asked more than once if
> the IETF had gotten so paralyzed by this issue that it was time
> to move it to a different forum (and I was told that ITU had
> volunteered).
>
> Where does this take us?  I tried to propose a "lower case
> mappings only" model a few weeks ago, on the theory that it was
> the one that was needed to simulate the matching behavior of
> the DNS, to avoid a situation in which the addition of one
> character to a string could change it from "case-insensitive
> matching" to "case-sensitive matching and possibly invalid",
> and because, in Unicode terms, it depended only on the
> well-understood (although, as Jefsey has pointed out, not
> universally accepted) lower-case procedure and not on the more
> subtle and less-generally-understood case folding and
> compatibility character relationships.  As far as I can tell,
> the proposal died a swift but painful death, mostly on the
> principle that, if the Latin/Greek/Cyrillic folks were going to
> get lower case mapping, then there were all sorts of mappings
> that others would like (or insist on).
>
> So, in the context of the above and in the hope that it will
> provide a foundation for moving forward, let me try out another
> suggestion (necessarily less specific than the lower-case one;
> there are details that would have to be sorted out here).

Ok...here we go! ;-)

> (i) We ban registration-side mapping in the protocol and
> discourage any local mapping on that side.  There is really no
> need for it and having a registrant be absolutely clear about
> what is going into the DNS, how the native character form will
> appear when converted from the A-label, etc., seems important.
> It is also consistent with the current practices of a large
> number of registries who handle IDNs (see Pat Kane's recent
> note for an example of a specific procedure).  Based on my
> understanding of discussions on the list, I modified the latest
> versions of Protocol and Rationale reflect this restriction in
> the posted versions: all of the local mapping text has been
> removed and even the "get it into Unicode" text has been
> eliminated.  Of course that could be changed back if the WG
> reaches some other conclusion.
>
> (ii) We make it clear (if it isn't already) that, in cases were
> either changes in the  protocol or the nature of things (e.g.,
> Traditional-Simplified Chinese relationships) creates a
> situation in which perceived relationships among label strings
> are important, it is the responsibility of the relevant
> registry to cope by making a policy they consider appropriate,
> enforcing it, and taking responsibility for it.   We can, and
> have, suggested some alternatives, but, for reasons already
> discussed on the list, should not try to go much further.
>
> (iii) We tell folks on the lookup side that, if a label in
> native-character form is invalid under IDNA2008 but valid under
> IDNA2003, they SHOULD apply the IDNA2003 mappings and look the
> thing up.  Note that this implies two tests but only one lookup
> in the DNS.  I'm not happy about this suggestion for a long
> list of reasons, but perhaps it gives a basis for moving
> forward.  Note that this does not suggest revisiting Stringprep
> and creating any new mappings.  And it clearly doesn't help
> with the "changed interpretation" cases.

Hmm...ok...

> (iv) For the four "changed interpretation" cases, we make it
> clear that the IDNA2008 interpretation is the important one and
> that registries have a lot of responsibility here.   However,
> if an application is in a position to deliver two different
> answers to the user, then it MAY reasonably do both lookups and
> then do whatever with them seems appropriate (obviously, a "did
> you really mean?" dialogue would be one such option).
>
> Does that help?

It clearly create a backward compatibility, but it also clearly MIGHT  
create the need for either a full mapping category in tables, or some  
additions to the exceptions list.

The category "MAPPING" is to be defined.

Sure, it works.

Doing the diff might be hard, but doable.

    Patrik

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20090226/f318938f/attachment.pgp 


More information about the Idna-update mailing list