Parsing the issues and finding a middle ground -- another attempt

Erik van der Poel erikv at google.com
Fri Feb 27 22:08:02 CET 2009


Hi Vint,

Thank you for your patience.

On Thu, Feb 26, 2009 at 6:32 PM, Vint Cerf <vint at google.com> wrote:
> if we reject Esszet and final sigma as PVALID, then the present situation in
> which they are mapped means that their use will fail under IDNA2008 -
> because they only worked as a consequence of mapping under IDNA2003.

No, they would only fail under IDNA2008 if the pre-processor did not
map them. (The pre-processor spec is outside the current list of
IDNA2008 drafts.)

> If we
> allow them as PVALID and let the registries include both formerly mapped and
> unmapped forms, at least I think we end up with something that can
> accommodate both usages except that the occurrence under IDNA2008 would be
> through direct use of both forms with punycoding of each.

I believe Vaggelis has been explaining that the .gr registry folks are
not entirely happy with the DNAME half-solution. If we make Final
Sigma PVALID (and refrain from mapping it to Normal Sigma), the .gr
folks will have to add even more DNAMEs.

Vaggelis has said that a PVALID Final Sigma does have its
"advantages", and I believe one of them would be the ability to
display the Final Sigma to users. However, as I have explained, you
can get the display advantage via http://<domain-name>.gr/idndisp.txt
without leaving Final Sigma unmapped.

If the .gr folks decide that IDNA2003 has failed under .gr, they may
also decide to experiment with the following:

(1) MSIE plug-in for the URL bar (similar to the old IDNA2003 plug-ins)
(2) Firefox extension or modification for the URL bar
(3) for the keyboard only, map letters with tonos to letters without tonos
(4) continue to map final sigma to normal sigma
(5) after mapping, convert to Punycode, prepend xn--, and perform DNS lookup
(6) make the MSIE/Firefox additions fetch http://<name>.gr/idndisp.txt
(7) before display, convert the display form to A-labels to make sure
they match the originals (for security reasons)
(8) if the local experiments show good results, try to get MSIE and
Firefox to adopt the .gr mappings in keyboard-related code
(9) provide mapping tools to the community, for HTML authoring, etc
(10) encourage HTML authors to use the xn-- form, so that DNAMEs are unnecessary

One of the dangers of this approach is so-called balkanization (or
fragmentation) of the Internet, especially if many ccTLDs and 2LDs
start experimenting with and demanding their own mappings.

However, the xn-- labels will continue to work in other parts of the
world, so there's no real fragmentation there, other than the
relatively minor display issue (since tonos-less letters look similar
to the same letters with tonos).

Internationalization and localization often start out as local
programs or modifications that eventually get adopted by software in
other parts of the world. For example, local engineers shoe-horned
bidi support into several programs, and eventually e.g. MSIE and
Firefox built their own bidi support.

It is important to refrain from performing the .gr mappings to domain
names found in hrefs in HTML. Otherwise, locally authored HTML
documents will not work in other parts of the world (unless there are
DNAMEs for those domain names, which would defeat the goal of
eventually getting rid of DNAMEs).

> That line of reasoning suggests that we should accept the earlier consensus
> to include these characters.

Vaggelis, you said that the results of your meeting are pending. Would
you be able to report the results in the near future?

Thanks,

Erik

PS I haven't said anything about Eszett in this email. I don't know
whether the German registry will use DNAME for Eszett and whether they
are happy with DNAME.

> On Feb 26, 2009, at 5:02 PM, Erik van der Poel wrote:
>> I guess that depends on what would be dropped on the floor. At this
>> point, some may wish to drop Eszett, Final Sigma, ZWJ and ZWNJ so that
>> we can at least get IDNA200X out the door with the most recent Unicode
>> version. Do we need another consensus call on these characters?
>>
>> Responding to John's email, I think it might be best for IDNA200X to
>> avoid any MUSTs/MUST NOTs related to mapping. The mapping issues
>> should probably be left to another document. This means that that
>> other document decides whether to do any mapping for characters beyond
>> Unicode 3.2.
>>
>> Erik
>>
>> On Thu, Feb 26, 2009 at 1:46 PM, Leslie Daigle <leslie at thinkingcat.com>
>> wrote:
>>>
>>> Speaking very clearly as a lurker here -- I would like to say, "yes,
>>> please".
>>>
>>> Sound engineering progress is desperately needed.
>>>
>>> I realize that this would mean a number of things the WG would like to
>>> solve will hit the editing room floor.
>>>
>>> I suspect that proceeding this way may mean that IDNs will either become
>>> only locally usable (where "local" is really "locale" -- you/your
>>> software understand the nuances of mapping), or the market forces of the
>>> world will quickly gravitate towards "best practices" for IDN selection
>>> that work in a relatively locale-independent fashion.
>>>
>>> While that may not be ideal from a global Internet perspective, we can't
>>> engineer to "suspicions", and in either case it is important to have a
>>> solid basis to fall back on so that registrants, registries and
>>> registrars are very clear on what specific entry is being made in the
>>> global DNS infrastructure.
>>>
>>> And that seems to be what you are describing.
>>>
>>> Leslie.
>>>
>>> John C Klensin wrote:
>>>>
>>>> Hi.
>>>>
>>>> Probably like some others in the WG, I've been lying awake
>>>> nights trying to figure out a way forward in this situation.
>>>> In the last few days, I've had the opportunity to talk with
>>>> several people who operate, or are close to, registries who
>>>> operate in parts of the world where they see IDNs as really
>>>> critical.  They want three things, not necessarily in this
>>>> order and sometimes stated in other ways:
>>>>
>>>> (i) DNS-based identifiers that are absolutely unambiguous and
>>>> predictable, even to people who are not deeply familiar with
>>>> the script in question nor with the specifics of Unicode (or
>>>> other CCS) design decisions and the details of their
>>>> implementation.   To them, that translates into treating only
>>>> those things as equal that are visually and bit-string
>>>> identical.  From that point of view, any equivalence on any
>>>> other basis is an issue of semantics and bindings to be decided
>>>> locally and hence are  a matter for registry or registrant
>>>> action, informed by local policies that may differ with
>>>> different domains/zones.
>>>>
>>>> (2) They want things to be as predictable (unsurprising to
>>>> users) as possible given the expectations of non-specialists
>>>> who read particular languages and the scripts in which they are
>>>> written, expectations that are also informed for some (but not
>>>> all) users by experience with the DNS.  In that regard, some
>>>> would go fully as far as Jefsey has suggested, with matching
>>>> rules dependent on language, locale, individual user
>>>> expectations, and context.  Those who understand the difference
>>>> between matching rules and mapping strings into other strings
>>>> in ways that lose information, and those who understand the
>>>> implausibility of localized mapping rules in a global DNS where
>>>> any valid label string can appear anywhere in the tree, know
>>>> how impossible that is, but it doesn't prevent wishing.  And
>>>> they certainly believe that the fact that some things are
>>>> impossible should not create a philosophical bias against
>>>> dealing with the more plausible cases.
>>>>
>>>> (3) They want this settled.  For some that is more important
>>>> than what conclusions we reach: other things that are important
>>>> to them are stuck waiting for it, whether those things are
>>>> entangled with ICANN policy-making, with efforts to formulate
>>>> local policies that will be stable over time, with decisions
>>>> about what labels to permit that would be different with a
>>>> Unicode 5.1 based system than with a Unicode 3.2 one, or with
>>>> the development of marketing and related strategies.  Everyone
>>>> I talked with is willing to deal with some incompatibility,
>>>> -- even to what labels are considered valid and with different
>>>> interpretations in the two systems -- between IDNA2003 and
>>>> whatever-comes-now -- as long this is the last time there are
>>>> incompatible changes and as long as we don't drag this out
>>>> much longer ("too long already" is a popular comment).  Against
>>>> that backdrop, they want no more incompatibility with IDNA2003
>>>> than necessary, but they consider the first two goals much
>>>> more important than strict compatibility... and they understand
>>>> that some of the compatibility problems are theirs to solve...
>>>> as long as we give them the tools.
>>>>
>>>> Most of them recognize the importance of both (1) and (2) and
>>>> understand that they are contradictory and require, at least,
>>>> balancing tradeoffs.  In that regard, they are doing better
>>>> than this WG sometimes does, in which people seem to be arguing
>>>> for one position or the other, treating the other one as
>>>> insignificant or irrelevant.   If I've contributed to that
>>>> style of discussion, I apologize: It was never my intent to do
>>>> so and I've seen the tradeoffs all along.
>>>>
>>>> They also understand that trying to find the right balance is
>>>> hard and are willing to cut us some slack on schedules because
>>>> of it.   But they don't see much progress (other than going
>>>> around in circles) and that concerns them.   Those who have
>>>> been following our work also seem to have no patience at all
>>>> for our having procedural arguments as a substitute for
>>>> addressing the real questions.  I was asked more than once if
>>>> the IETF had gotten so paralyzed by this issue that it was time
>>>> to move it to a different forum (and I was told that ITU had
>>>> volunteered).
>>>>
>>>> Where does this take us?  I tried to propose a "lower case
>>>> mappings only" model a few weeks ago, on the theory that it was
>>>> the one that was needed to simulate the matching behavior of
>>>> the DNS, to avoid a situation in which the addition of one
>>>> character to a string could change it from "case-insensitive
>>>> matching" to "case-sensitive matching and possibly invalid",
>>>> and because, in Unicode terms, it depended only on the
>>>> well-understood (although, as Jefsey has pointed out, not
>>>> universally accepted) lower-case procedure and not on the more
>>>> subtle and less-generally-understood case folding and
>>>> compatibility character relationships.  As far as I can tell,
>>>> the proposal died a swift but painful death, mostly on the
>>>> principle that, if the Latin/Greek/Cyrillic folks were going to
>>>> get lower case mapping, then there were all sorts of mappings
>>>> that others would like (or insist on).
>>>>
>>>> So, in the context of the above and in the hope that it will
>>>> provide a foundation for moving forward, let me try out another
>>>> suggestion (necessarily less specific than the lower-case one;
>>>> there are details that would have to be sorted out here).
>>>>
>>>> (i) We ban registration-side mapping in the protocol and
>>>> discourage any local mapping on that side.  There is really no
>>>> need for it and having a registrant be absolutely clear about
>>>> what is going into the DNS, how the native character form will
>>>> appear when converted from the A-label, etc., seems important.
>>>> It is also consistent with the current practices of a large
>>>> number of registries who handle IDNs (see Pat Kane's recent
>>>> note for an example of a specific procedure).  Based on my
>>>> understanding of discussions on the list, I modified the latest
>>>> versions of Protocol and Rationale reflect this restriction in
>>>> the posted versions: all of the local mapping text has been
>>>> removed and even the "get it into Unicode" text has been
>>>> eliminated.  Of course that could be changed back if the WG
>>>> reaches some other conclusion.
>>>>
>>>> (ii) We make it clear (if it isn't already) that, in cases were
>>>> either changes in the  protocol or the nature of things (e.g.,
>>>> Traditional-Simplified Chinese relationships) creates a
>>>> situation in which perceived relationships among label strings
>>>> are important, it is the responsibility of the relevant
>>>> registry to cope by making a policy they consider appropriate,
>>>> enforcing it, and taking responsibility for it.   We can, and
>>>> have, suggested some alternatives, but, for reasons already
>>>> discussed on the list, should not try to go much further.
>>>>
>>>> (iii) We tell folks on the lookup side that, if a label in
>>>> native-character form is invalid under IDNA2008 but valid under
>>>> IDNA2003, they SHOULD apply the IDNA2003 mappings and look the
>>>> thing up.  Note that this implies two tests but only one lookup
>>>> in the DNS.  I'm not happy about this suggestion for a long
>>>> list of reasons, but perhaps it gives a basis for moving
>>>> forward.  Note that this does not suggest revisiting Stringprep
>>>> and creating any new mappings.  And it clearly doesn't help
>>>> with the "changed interpretation" cases.
>>>>
>>>> (iv) For the four "changed interpretation" cases, we make it
>>>> clear that the IDNA2008 interpretation is the important one and
>>>> that registries have a lot of responsibility here.   However,
>>>> if an application is in a position to deliver two different
>>>> answers to the user, then it MAY reasonably do both lookups and
>>>> then do whatever with them seems appropriate (obviously, a "did
>>>> you really mean?" dialogue would be one such option).
>>>>
>>>> Does that help?
>>>>
>>>>   john
>>>>
>>>>
>>>> _______________________________________________
>>>> Idna-update mailing list
>>>> Idna-update at alvestrand.no
>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>
>>> --
>>>
>>> -------------------------------------------------------------------
>>> "Reality:
>>>     Yours to discover."
>>>                                -- ThinkingCat
>>> Leslie Daigle
>>> leslie at thinkingcat.com
>>> -------------------------------------------------------------------
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>


More information about the Idna-update mailing list