Parsing the issues and finding a middle ground -- another attempt

Mark Davis mark at macchiato.com
Wed Mar 4 00:35:40 CET 2009


I believe that

a) it needs to be a MUST, despite the charter issue. IDNA2003 had a must,
and this simply continues that.
b) we should not mention the local mappings. Clearly, if someone wanted a UI
mapping in entering in a URL into an address bar, there isn't anything we
can do to stop that, but there is no point in emphasizing something that
would just be an interoperability problem.

More comments below.

Mark


On Tue, Mar 3, 2009 at 12:27, John C Klensin <klensin at jck.com> wrote:

> Mark,
>
> Thanks for the clarity of these comments.  I'm glad we are
> converging.   Text on which we've agreed are elided below, but
> anyone who disagrees with Mark's conclusions in those areas
> should speak up quickly.
>
> --On Tuesday, March 03, 2009 11:47 -0800 Mark Davis
> <mark at macchiato.com> wrote:
>
> > We have had a lot of productive discussion lately. Here is my
> > take on your questions of 6 days ago, with "..." elisions so
> > as to get at the core questions.
> >...
>
> >> (ii) We make it clear (if it isn't already) that, in cases
> >> ...  in which perceived relationships among label strings are
> >> important, it is the responsibility of the relevant registry
> >> to cope ....
>
> > I'm not sure what this means. I'm guessing you mean policies
> > like bundling and blocking; if so, I agree.
>
> Yes, that is what I meant, but the word "like" is key -- many
> registry operators are smart folks with clear ideas about what
> should be done for the situations they face.  I don't think we
> should try to constrain them to particular solutions and don't
> think they would pay much attention to us if we did.


ok

>
>
> >> (iii) We tell folks on the lookup side that, if a label in
> >> native-character form is invalid under IDNA2008 but valid
> >> under IDNA2003, they SHOULD apply the IDNA2003 mappings and
> >> look the thing up.  Note that this implies two tests but only
> >> one lookup in the DNS. ...
> >
> > If we made that a MUST, I'd be happy with it. If it is not a
> > MUST, then we can always have two kinds of implementations,
> > which will inevitably cause some interoperability problems.
>
> I said "SHOULD" because, in IETF-speak, MUST implies that there
> are no exceptions and, in particular, no cases in which an
> implementation (or application specification) might reasonably
> insist on a "no mapping" approach for absolute precision about
> what is being done.  I can think of several cases where that
> might be appropriate.   One of them might be email addresses,
> where there is already a tradition of "if you don't specify
> exactly what you intend, the message isn't going to go through"
> (but I stress "might" -- that decision is not under the control
> of this WG).


Any such cases would just be interoperability problems. There is no
particular need for us to create such cases. IDNA2003 had a MUST for
mappings, and this just continues that MUST -- on the client side. (It
inverts that MUST for registration, which we are in agreement on.)


>
> If we could figure out how to say it and it made you and others
> more comfortable, I'd be comfortable with a requirement that, if
> the IDNA2008 lookup fails, one either apply the IDNA2003
> mappings or no mappings at all.


I don't think it is necessary to have that option.

>
>
> Of course, if we specify any mappings at all, even
> transitionally, the WG is going to have to wrestle with the
> charter limitations, whether negotiations with the IESG are
> required, and, if we go that far, whether we are willing to
> consider a reset that would consider Paul's proposal (or Adam's)
> on an equal footing with the IDNA2008 work.


Understandably. But the more we've looked at the interoperability issues,
the more serious they appear. So I think we need to byte the bullet. And
once we add the IDNA2003 mappings, I think the whole package is then
sufficiently attractive that as a working group we would end up settling on
it.


>
>
> > Even somewhat better would be to have updated mappings a la
> > TR46.  Some figures:
> >
> >    - There are about 5.5K characters added after Unicode
> > 3.2<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%
> > 5B:age=5.1:%5D-%5B:age=3.2:%5D%5D>.    (Note also that 5.2 is
> > due out this fall, and will add more).    - Of these 433 have
> > NFKC+CaseFold
> > mappings<http://unicode.org/cldr/utility/list-unicodeset.jsp?a
> > =%5B%5B:age=5.1:%5D-%5B:age=3.2:%5D-%5B%5B:isLowercase:%5D-%5B
> > :nfkcqc=n:%5D%5D%5D>    .
> >
> > While a number of these are archaic, some are not. It would be
> > inconsistent for a language using new and old characters for
> > some characters be mapped and others not. This would
> > especially be the case for uppercases: illustrating this with
> > ASCII, for "Abc" to map to "abc", but for "Bcd" to just fail.
> >
> > However, bottom line, the main reasons for the mappings are
> > interoperability, so it is far, far important for us to
> > maintain the 2003 mappings than to extend them to new
> > characters.
>
> While I can see making some accommodations to transition (i.e.,
> I'm sympathetic to your "bottom line"), part of the starting
> point for this work was a good deal of concern that the
> compatibility and CaseFold mappings of IDNA2003 were sources of
> confusion and, for some circumstances, not even right*.


I agree that they were sources of confusion on the registration end.

As to the "not even right"; that is open to interpretation. As we know,
there will always be different positions whenever we have a common mapping
over all of Unicode -- there is no way to match conflicts among languages,
for example.



> I
> think that we at least need to balance the two sets of concerns
> --and focusing on transition interoperability is one such
> possible balance-- but that we can't reasonably blow off the
> other one.
>
> I will be posting a separate note about a possible way to handle
> more extensive mappings and perhaps even these transitional/
> compatibility ones, at the IRI -> URI boundary level as soon as
> I have time, but want to try to get the current documents
> together first.
>
> > (iv) For the four "changed interpretation" cases, we make it
> >> clear that the IDNA2008 interpretation is the important one
> >> and that registries have a lot of responsibility here.
> >> However, if an application is in a position to deliver two
> >> different answers to the user, then it MAY reasonably do both
> >> lookups and then do whatever with them seems appropriate
> >> (obviously, a "did you really mean?" dialogue would be one
> >> such option).
> >
> > Agreed as well. That, I think, is the only option I've heard
> > for handling for whatever characters end up in IDNA 2008 with
> > changed interpretations that would help mitigate the security
> > problems.
> >
> > The specified order of lookup will be important.
>
> Yes.  That is an old and familiar issue with the DNS and "DNS
> search".  I think that we have to specify IDNA2008 lookup as
> primary or we risk propagating old problems.   I hope you and
> others agree with that.


I have no problem with that.


>
>
> > The did you
> > mean option could be recommended for user-facing code. That
> > isn't, of course, much use for a lot of software like search
> > engines, but for UIs could be useful.
>
> Well, actually, the very nature of most search engines as seen
> by the user is that they report lists of results which might
> match a given query.   Returning results that match both
> interpretations of a label, if they are different, is no
> different (again, from a user point of view) than returning
> different results for different spellings or different
> homograph-definitions, or a search string.  Of course, any of
> those options complicates the indexing and ranking processes,
> and some search/indexing engines may not consider building and
> retaining the relevant information to be worth the trouble.  But
> I assume the market would then sort out the importance of doing
> so.


When crawling, both alternatives can be followed. For other processing, that
is not an option.


>
>
> Conversely, while I agree that it would be useful for some UIs,
> it would be a big mistake for others.  Again, I believe that
> sorting out which is which is a matter for the marketplace, not
> standards that take one position or the other.


Agreed.

>
>
> best,
>    john
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090303/5eeb1a8b/attachment-0001.htm 


More information about the Idna-update mailing list