The paragraph in question originated in Harald's document:<br><br>> Bidi-5.<br>> One particular example of the last case is if a program chooses to<br>> examine the last character (in network order) of a string in order to
<br>> determine its directionality, rather than its first; if it finds an<br>><br>> NSM character and tries to display the string as if it was a left-to-<br>> right string, the resulting display may be interesting, but not
<br>> useful.<br><br>I was speaking loosely of a URL when I should have said IRI. I was operating on the same level as Harald's original text, which is a Unicode character level, not a Punycode level. So replace what I said by IRI. Sorry for the confusion.
<br><br>Harald's text must also have been referring to IRI as well, since NSMs don't occur in URLs.<br><br>So much of what you wrote was directed at something that I didn't mean, and I'll skip over that. There are a few parts I'll comment on below.
<br><br><br><div class="gmail_quote">On Jan 13, 2008 11:20 AM, John C Klensin <<a href="mailto:klensin@jck.com">klensin@jck.com</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Mark,<br><br></blockquote><div>...<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Certainly, as you, Martin, Erik, and others have pointed out in
<br>various ways, there are many places in which strings appear that<br>look like URLs and don't conform to URL rules. It may be<br>perfectly reasonable in some contexts to have a string that<br>looks like a URL but that contains non-ASCII characters. But,
<br>unless it is an IRI in a context in which IRIs are permitted,<br>one gets from such a string to a URL via exactly the sort of<br>preprocessing that we've been discussing as "user agent"<br>functionality in the IDNAbis context.
</blockquote><div><br>I don't think it's as simple as calling it a "UI" context. Using the term "preprocessing" step (as you do below) is clearer. For more, see below.<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
It is also possible that I misunderstand what you mean by<br>"assume". Neither an implementation of IDNA2003 nor an<br>implementation of IDNA200X is conformant with the intent of<br>those specifications if it "assumes" any of these things and
<br>then goes off and behaves as if they are true. In both cases,<br>implementations are expected to test the strings they intend to<br>pass (or intend others to pass) to the DNS so that<br>non-conforming strings will fail. In IDNA2003, most of the
<br>testing is built into ToASCII and the operations surrounding it.<br>In IDNA200X, much of the testing is more explicit. But neither<br>assumes things that it doesn't verify.</blockquote><div><br>I think we may agree on this. Part of my confusion with Harald's original text presumed that we had an implementation that made a (false) presumption by assuming that IDNAs were necessarily IDNAbis -- so a change would cause a problem for some implementation.
<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Clearly, there is at least other issue. It arises for names<br>that are valid under IDNA200X but not obviously valid under
<br>IDNA2003. An IDNA2003 lookup implementation will reject some of<br>them as invalid (some or most of those that merely contain<br>codepoints that are unassigned in Unicode 3.2 but assigned in<br>later versions may slip through). In the long term, the only
<br>way to make all of the newly-available characters and strings<br>available to IDN-using applications is for implementations of<br>those applications to upgrade. That would be true of any update<br>to IDNA that moves beyond Unicode
3.2, especially since<br>registration of strings that contain codepoints that are are<br>unassigned at registration time is, fairly obviously, the worst<br>of bad practices.</blockquote><div><br>I foresee an indefinitely long period in which many programs like browsers, emailers, etc would need to handle both IDNA2003 and IDNAbis.
<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Now I'm being a little pedantic here, for which I apologize, but<br>I think the point is important. If any of the majority of the
<br>cases you list above, what the strings occur in is not a URL,<br>but something that must be transformed into a URL.</blockquote><div> </div><div>agreed<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
...</blockquote><div> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Now I'm going to make two assumptions with which you may<br>
disagree. The first is that the IDNA200X model is sufficiently<br>different from the IDNA2003 one that few, if any, applications<br>are going to switch (or be able or inclined to switch) from<br>IDNA2003 to IDNA200X by a completely automatic process without
<br>anyone thinking about it or noticing. </blockquote><div><br>I would disagree a bit with the first one. Many programs (such as those in my company, Google) will need to handle both, for an indefinite period. I expect what we will probably do is to
<br><ul><li>See if it works under IDNA2003. If so, fine</li><li>Otherwise see if it works under IDNAbis, if so, fine</li><li>See if the major browsers accept it anyway, if so, we'll need to take it anyway.</li></ul>Take a look at the following table,
<font size="2">for example:<br><br></font><table id="table1" style="border-collapse: collapse;" border="1">
        <tbody><tr>
                <th align="left"><font size="2"> </font></th>
                <th align="left"><font size="2">Link</font></th>
                <th align="left"><font size="2">Firefox</font></th>
                <th align="left"><font size="2">IE7</font></th>
        </tr>
        <tr>
                <td><font size="2">0</font></td>
                <td><font size="2"><a href="<a href="http://b%C3%BCcher.de/">http://bücher.de</a>"></font></td>
                <td><font size="2">works</font></td>
                <td><font size="2">works</font></td>
        </tr>
        <tr>
                <td><font size="2">1</font></td>
                <td><font size="2"><a href="<a href="http://b%C3%BCcher.de/">http://Bücher.de</a>"></font></td>
                <td><font size="2">works</font></td>
                <td><font size="2">works</font></td>
        </tr>
        <tr>
                <td><font size="2">2</font></td>
                <td><font size="2"><a href="<a href="http://b%C3%BCcher.de/">http://xn--bcher-kva.de</a>"></font></td>
                <td><font size="2">works</font></td>
                <td><font size="2">works</font></td>
        </tr>
        <tr>
                <td><font size="2">3</font></td>
                <td><font size="2"><a href="<a href="http://b%c3%bccher.de/">http://B%C3%BCcher.de</a>"></font></td>
                <td><font size="2">doesn't</font></td>
                <td><font size="2">doesn't</font></td>
        </tr>
</tbody></table>
<font size="2"><br></font><p><font size="2">Because Firefox and IE7 both accept (0), (1), and (2), I can't see any way around Google's handling them also. This is into the indefinite future, even though #0 and #1 are not in Punycode. And this is not a U
</font>I issue; these are in the HTML page. That's why "preprocessing" is a better phrase than "UI".<br></p><p>The more of the web and net's infrastructure that accepts these variations, the more that other programs need to accommodate them, so that they interwork with one another.
</p>What I really don't want to see is an IDNAbis that fails to gain traction because of this (thinking back to XML 1.1, which failed to gain traction because of a really rather small incompatibility with XML 1.0).<br>
<p><br></p></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">...</blockquote><div> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br><br>The second assumption is that any implementation that now<br>depends upon, or offers to users, the input flexibilities of<br>IDNA2003 (some applications of IDNA2003 do not) would be stupid<br>to implement IDNA200X in a way that simply drops those
<br>flexibilities. Whether it should quietly retain them, or<br>produce more or less subtle warnings to users about the<br>conversions becomes a local design matter (and programs that<br>communicate with users obviously have choices that are not
<br>available to ones do not), it appears to me that we are already<br>heading in the direction of applications (and, if that approach<br>isn't stopped for other reasons, "smart domain name servers")<br>making decisions about some things being safer than others and
<br>conditioning their actions on those decisions.</blockquote><div><br>I think we're in agreement here. <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br><br>The rationale document doesn't cover that situation nearly well<br>enough at -05, but there is a new section and extensive text<br>about it in the working version of -06. I don't think anything<br>there will come as a surprise, since all of the issues have been
<br>discussed on this list and much of the text is derived from<br>discussions on the list.</blockquote><div><br>Good.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d"><br>> What I'm saying is that essentially all of the incompatible<br>> differences between 2003 and the current bis are potential<br>> problems for some implementation, and once we get done with
<br>> bis, we will need to list them all. So just calling out #5 is<br>> insufficient.<br><br></div>While our perspective on these "incompatible differences" is<br>quite different, I hope that the new text in issues-06 will
<br>address many of your concerns. </blockquote><div><br>Looking forward to it.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
But it is also true that many of<br>those differences are differences in how and when IDNA is<br>applied that are simply not defined by the original protocol or<br>are differences that are important only if applicability<br>
principles or guidelines about the use of the original protocol<br>were violated. If adjustments in those areas are impossible,<br>then we are in very difficult waters indeed.</blockquote><div><br>Yes, I think we may need to be pragmatic about the changes that we introduce, because of the established conventions...
<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br><br>>...<br><br>best,<br><font color="#888888"> john<br></font><div>
<div></div><div class="Wj3C7c"><br>_______________________________________________<br>Idna-update mailing list<br><a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br><a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">
http://www.alvestrand.no/mailman/listinfo/idna-update</a><br></div></div></blockquote></div><br><br clear="all"><br>-- <br>Mark