I believe that <br><br>a) it needs to be a MUST, despite the charter issue. IDNA2003 had a must, and this simply continues that.<br>b) we should not mention the local mappings. Clearly, if someone wanted a UI mapping in entering in a URL into an address bar, there isn't anything we can do to stop that, but there is no point in emphasizing something that would just be an interoperability problem.<br>
<br>More comments below.<br><br clear="all">Mark<br>
<br><br><div class="gmail_quote">On Tue, Mar 3, 2009 at 12:27, John C Klensin <span dir="ltr"><<a href="mailto:klensin@jck.com">klensin@jck.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Mark,<br>
<br>
Thanks for the clarity of these comments. I'm glad we are<br>
converging. Text on which we've agreed are elided below, but<br>
anyone who disagrees with Mark's conclusions in those areas<br>
should speak up quickly.<br>
<br>
--On Tuesday, March 03, 2009 11:47 -0800 Mark Davis<br>
<div class="im"><<a href="mailto:mark@macchiato.com">mark@macchiato.com</a>> wrote:<br>
<br>
> We have had a lot of productive discussion lately. Here is my<br>
> take on your questions of 6 days ago, with "..." elisions so<br>
> as to get at the core questions.<br>
</div>>...<br>
<div class="im"><br>
>> (ii) We make it clear (if it isn't already) that, in cases<br>
>> ... in which perceived relationships among label strings are<br>
>> important, it is the responsibility of the relevant registry<br>
>> to cope ....<br>
<br>
> I'm not sure what this means. I'm guessing you mean policies<br>
> like bundling and blocking; if so, I agree.<br>
<br>
</div>Yes, that is what I meant, but the word "like" is key -- many<br>
registry operators are smart folks with clear ideas about what<br>
should be done for the situations they face. I don't think we<br>
should try to constrain them to particular solutions and don't<br>
think they would pay much attention to us if we did.</blockquote><div><br>ok <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
<div class="im"><br>
>> (iii) We tell folks on the lookup side that, if a label in<br>
>> native-character form is invalid under IDNA2008 but valid<br>
>> under IDNA2003, they SHOULD apply the IDNA2003 mappings and<br>
>> look the thing up. Note that this implies two tests but only<br>
>> one lookup in the DNS. ...<br>
><br>
> If we made that a MUST, I'd be happy with it. If it is not a<br>
> MUST, then we can always have two kinds of implementations,<br>
> which will inevitably cause some interoperability problems.<br>
<br>
</div>I said "SHOULD" because, in IETF-speak, MUST implies that there<br>
are no exceptions and, in particular, no cases in which an<br>
implementation (or application specification) might reasonably<br>
insist on a "no mapping" approach for absolute precision about<br>
what is being done. I can think of several cases where that<br>
might be appropriate. One of them might be email addresses,<br>
where there is already a tradition of "if you don't specify<br>
exactly what you intend, the message isn't going to go through"<br>
(but I stress "might" -- that decision is not under the control<br>
of this WG).</blockquote><div><br>Any such cases would just be interoperability problems. There is no particular need for us to create such cases. IDNA2003 had a MUST for mappings, and this just continues that MUST -- on the client side. (It inverts that MUST for registration, which we are in agreement on.)<br>
<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
<br>
If we could figure out how to say it and it made you and others<br>
more comfortable, I'd be comfortable with a requirement that, if<br>
the IDNA2008 lookup fails, one either apply the IDNA2003<br>
mappings or no mappings at all.</blockquote><div><br>I don't think it is necessary to have that option. <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
<br>
Of course, if we specify any mappings at all, even<br>
transitionally, the WG is going to have to wrestle with the<br>
charter limitations, whether negotiations with the IESG are<br>
required, and, if we go that far, whether we are willing to<br>
consider a reset that would consider Paul's proposal (or Adam's)<br>
on an equal footing with the IDNA2008 work.</blockquote><div><br>Understandably. But the more we've looked at the interoperability issues, the more serious they appear. So I think we need to byte the bullet. And once we add the IDNA2003 mappings, I think the whole package is then sufficiently attractive that as a working group we would end up settling on it.<br>
<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
<div class="im"><br>
> Even somewhat better would be to have updated mappings a la<br>
> TR46. Some figures:<br>
><br>
> - There are about 5.5K characters added after Unicode<br>
</div>> 3.2<<a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%" target="_blank">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%</a><br>
> 5B:age=5.1:%5D-%5B:age=3.2:%5D%5D>. (Note also that 5.2 is<br>
<div class="im">> due out this fall, and will add more). - Of these 433 have<br>
> NFKC+CaseFold<br>
</div>> mappings<<a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a" target="_blank">http://unicode.org/cldr/utility/list-unicodeset.jsp?a</a><br>
> =%5B%5B:age=5.1:%5D-%5B:age=3.2:%5D-%5B%5B:isLowercase:%5D-%5B<br>
> :nfkcqc=n:%5D%5D%5D> .<br>
<div class="im">><br>
> While a number of these are archaic, some are not. It would be<br>
> inconsistent for a language using new and old characters for<br>
> some characters be mapped and others not. This would<br>
> especially be the case for uppercases: illustrating this with<br>
> ASCII, for "Abc" to map to "abc", but for "Bcd" to just fail.<br>
><br>
> However, bottom line, the main reasons for the mappings are<br>
> interoperability, so it is far, far important for us to<br>
> maintain the 2003 mappings than to extend them to new<br>
> characters.<br>
<br>
</div>While I can see making some accommodations to transition (i.e.,<br>
I'm sympathetic to your "bottom line"), part of the starting<br>
point for this work was a good deal of concern that the<br>
compatibility and CaseFold mappings of IDNA2003 were sources of<br>
confusion and, for some circumstances, not even right*. </blockquote><div><br>I agree that they were sources of confusion on the registration end.<br><br>As to the "not even right"; that is open to interpretation. As we know, there will always be different positions whenever we have a common mapping over all of Unicode -- there is no way to match conflicts among languages, for example.<br>
<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">I<br>
think that we at least need to balance the two sets of concerns<br>
--and focusing on transition interoperability is one such<br>
possible balance-- but that we can't reasonably blow off the<br>
other one.<br>
<br>
I will be posting a separate note about a possible way to handle<br>
more extensive mappings and perhaps even these transitional/<br>
compatibility ones, at the IRI -> URI boundary level as soon as<br>
I have time, but want to try to get the current documents<br>
together first.<br>
<div class="im"><br>
> (iv) For the four "changed interpretation" cases, we make it<br>
>> clear that the IDNA2008 interpretation is the important one<br>
>> and that registries have a lot of responsibility here.<br>
>> However, if an application is in a position to deliver two<br>
>> different answers to the user, then it MAY reasonably do both<br>
>> lookups and then do whatever with them seems appropriate<br>
>> (obviously, a "did you really mean?" dialogue would be one<br>
>> such option).<br>
><br>
> Agreed as well. That, I think, is the only option I've heard<br>
> for handling for whatever characters end up in IDNA 2008 with<br>
> changed interpretations that would help mitigate the security<br>
> problems.<br>
><br>
> The specified order of lookup will be important.<br>
<br>
</div>Yes. That is an old and familiar issue with the DNS and "DNS<br>
search". I think that we have to specify IDNA2008 lookup as<br>
primary or we risk propagating old problems. I hope you and<br>
others agree with that.</blockquote><div><br>I have no problem with that.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
<div class="im"><br>
> The did you<br>
> mean option could be recommended for user-facing code. That<br>
> isn't, of course, much use for a lot of software like search<br>
> engines, but for UIs could be useful.<br>
<br>
</div>Well, actually, the very nature of most search engines as seen<br>
by the user is that they report lists of results which might<br>
match a given query. Returning results that match both<br>
interpretations of a label, if they are different, is no<br>
different (again, from a user point of view) than returning<br>
different results for different spellings or different<br>
homograph-definitions, or a search string. Of course, any of<br>
those options complicates the indexing and ranking processes,<br>
and some search/indexing engines may not consider building and<br>
retaining the relevant information to be worth the trouble. But<br>
I assume the market would then sort out the importance of doing<br>
so.</blockquote><div><br>When crawling, both alternatives can be followed. For other processing, that is not an option.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
<br>
Conversely, while I agree that it would be useful for some UIs,<br>
it would be a big mistake for others. Again, I believe that<br>
sorting out which is which is a matter for the marketplace, not<br>
standards that take one position or the other.</blockquote><div><br>Agreed. <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
<br>
best,<br>
<font color="#888888"> john<br>
<br>
</font></blockquote></div><br>