Parsing the issues and finding a middle ground -- another attempt

Tue Mar 3 21:27:11 CET 2009

Mark,

Thanks for the clarity of these comments.  I'm glad we are
converging.   Text on which we've agreed are elided below, but
anyone who disagrees with Mark's conclusions in those areas
should speak up quickly.

--On Tuesday, March 03, 2009 11:47 -0800 Mark Davis
<mark at macchiato.com> wrote:

> We have had a lot of productive discussion lately. Here is my
> take on your questions of 6 days ago, with "..." elisions so
> as to get at the core questions.
>...

>> (ii) We make it clear (if it isn't already) that, in cases
>> ...  in which perceived relationships among label strings are
>> important, it is the responsibility of the relevant registry
>> to cope ....

> I'm not sure what this means. I'm guessing you mean policies
> like bundling and blocking; if so, I agree.

Yes, that is what I meant, but the word "like" is key -- many
registry operators are smart folks with clear ideas about what
should be done for the situations they face.  I don't think we
should try to constrain them to particular solutions and don't
think they would pay much attention to us if we did.

>> (iii) We tell folks on the lookup side that, if a label in
>> native-character form is invalid under IDNA2008 but valid
>> under IDNA2003, they SHOULD apply the IDNA2003 mappings and
>> look the thing up.  Note that this implies two tests but only
>> one lookup in the DNS. ...
> 
> If we made that a MUST, I'd be happy with it. If it is not a
> MUST, then we can always have two kinds of implementations,
> which will inevitably cause some interoperability problems.

I said "SHOULD" because, in IETF-speak, MUST implies that there
are no exceptions and, in particular, no cases in which an
implementation (or application specification) might reasonably
insist on a "no mapping" approach for absolute precision about
what is being done.  I can think of several cases where that
might be appropriate.   One of them might be email addresses,
where there is already a tradition of "if you don't specify
exactly what you intend, the message isn't going to go through"
(but I stress "might" -- that decision is not under the control
of this WG).  

If we could figure out how to say it and it made you and others
more comfortable, I'd be comfortable with a requirement that, if
the IDNA2008 lookup fails, one either apply the IDNA2003
mappings or no mappings at all.

Of course, if we specify any mappings at all, even
transitionally, the WG is going to have to wrestle with the
charter limitations, whether negotiations with the IESG are
required, and, if we go that far, whether we are willing to
consider a reset that would consider Paul's proposal (or Adam's)
on an equal footing with the IDNA2008 work.

> Even somewhat better would be to have updated mappings a la
> TR46.  Some figures:
> 
>    - There are about 5.5K characters added after Unicode
> 3.2<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%
> 5B:age=5.1:%5D-%5B:age=3.2:%5D%5D>.    (Note also that 5.2 is
> due out this fall, and will add more).    - Of these 433 have
> NFKC+CaseFold
> mappings<http://unicode.org/cldr/utility/list-unicodeset.jsp?a
> =%5B%5B:age=5.1:%5D-%5B:age=3.2:%5D-%5B%5B:isLowercase:%5D-%5B
> :nfkcqc=n:%5D%5D%5D>    .
> 
> While a number of these are archaic, some are not. It would be
> inconsistent for a language using new and old characters for
> some characters be mapped and others not. This would
> especially be the case for uppercases: illustrating this with
> ASCII, for "Abc" to map to "abc", but for "Bcd" to just fail.
> 
> However, bottom line, the main reasons for the mappings are
> interoperability, so it is far, far important for us to
> maintain the 2003 mappings than to extend them to new
> characters.

While I can see making some accommodations to transition (i.e.,
I'm sympathetic to your "bottom line"), part of the starting
point for this work was a good deal of concern that the
compatibility and CaseFold mappings of IDNA2003 were sources of
confusion and, for some circumstances, not even right*.   I
think that we at least need to balance the two sets of concerns
--and focusing on transition interoperability is one such
possible balance-- but that we can't reasonably blow off the
other one.

I will be posting a separate note about a possible way to handle
more extensive mappings and perhaps even these transitional/
compatibility ones, at the IRI -> URI boundary level as soon as
I have time, but want to try to get the current documents
together first.

> (iv) For the four "changed interpretation" cases, we make it
>> clear that the IDNA2008 interpretation is the important one
>> and that registries have a lot of responsibility here.
>> However, if an application is in a position to deliver two
>> different answers to the user, then it MAY reasonably do both
>> lookups and then do whatever with them seems appropriate
>> (obviously, a "did you really mean?" dialogue would be one
>> such option).
> 
> Agreed as well. That, I think, is the only option I've heard
> for handling for whatever characters end up in IDNA 2008 with
> changed interpretations that would help mitigate the security
> problems.
> 
> The specified order of lookup will be important.

Yes.  That is an old and familiar issue with the DNS and "DNS
search".  I think that we have to specify IDNA2008 lookup as
primary or we risk propagating old problems.   I hope you and
others agree with that.

> The did you
> mean option could be recommended for user-facing code. That
> isn't, of course, much use for a lot of software like search
> engines, but for UIs could be useful.

Well, actually, the very nature of most search engines as seen
by the user is that they report lists of results which might
match a given query.   Returning results that match both
interpretations of a label, if they are different, is no
different (again, from a user point of view) than returning
different results for different spellings or different
homograph-definitions, or a search string.  Of course, any of
those options complicates the indexing and ranking processes,
and some search/indexing engines may not consider building and
retaining the relevant information to be worth the trouble.  But
I assume the market would then sort out the importance of doing
so.

Conversely, while I agree that it would be useful for some UIs,
it would be a big mistake for others.  Again, I believe that
sorting out which is which is a matter for the marketplace, not
standards that take one position or the other.

best,
   john