There is a working copy at <a href="http://unicode.org/draft/reports/tr46/tr46.html">http://unicode.org/draft/reports/tr46/tr46.html</a> with some fixes made as a result of your comments. Some responses also below.<br><br clear="all">

Mark<br>

<br><br><div class="gmail_quote">On Wed, Oct 7, 2009 at 16:42, Mark Davis ☕ <span dir="ltr">&lt;<a href="mailto:mark@macchiato.com">mark@macchiato.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Martin,<br><br>I updated the utility to show the difference between IDNA2003, 2008, and TR46:<br><br><a href="http://unicode.org/cldr/utility/idna.jsp" target="_blank">http://unicode.org/cldr/utility/idna.jsp</a><br><br>Will follow on with responses to your comments, and Vint&#39;s request for a flow diagram.<br>


<br clear="all">Mark<br>

<br><br><div class="gmail_quote">On Mon, Oct 5, 2009 at 04:01, &quot;Martin J. Dürst&quot; <span dir="ltr">&lt;<a href="mailto:duerst@it.aoyama.ac.jp" target="_blank">duerst@it.aoyama.ac.jp</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<div>On 2009/10/05 17:03, Mark Davis ☕ wrote:<br>

&gt; If you have some particular suggestions regarding items in the document, you<br>

&gt; can submit them directly via the<br></div>

&gt; *[Feedback&lt;<a href="http://www.unicode.org/reports/tr46/#Feedback" target="_blank">http://www.unicode.org/reports/tr46/#Feedback</a>&gt;<div><br>

&gt; ]* link at the top. If you also want discussion of the topics, then also on<br>

&gt; one of the Unicode mailing lists. And if you have any other suggestions for<br>

&gt; how to bridge the compatibility gaps between IDNA2003 implementations and<br>

&gt; IDNA2008 implementations, those suggestions would be welcome. We had a<br>

&gt; number of people from the browser communities at the meetings, and these<br>

&gt; were the best we could come up as yet.<br>

&gt; *<br>

&gt; *Mark<br>

<br></div>

Hello Mark,<br>

<br>

Just some short comments here on this list while rushing through that document. Please forward these wherever appropriate.<br>

<br>

1.3.1, Deviations, says &quot;There are a few situations where the strict application of IDNA2008 will *always* result in the resolution of IDNs to different IP addresses than in IDNA2003.&quot;<br>

The *always* is of course wrong. The document itself later says &quot;Unless the &quot;DE&quot; registry bundles&quot;, which is, as far as we know, more or less what they are going to do. The other possibility is of course that the owner of the domain name in question makes sure they get both variants. For the business example that you show, that&#39;s not a problem at all, if the business knows about the issue.<br>

</blockquote></div></blockquote><div><br>I added here &quot;, unless the registry or registrant takes special action&quot;, and dropped &quot;always&quot;. Does that work?<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

In 1.3.2, Example 3. &quot;Map <a href="http://xn--bb-eka.at" target="_blank">http://ÖBB.at</a> to http:/<a href="http://phishing.com" target="_blank">phishing.com</a>&quot;, is completely weird. If any browser or similar device wants to spoof their users, they have always been able to do this, even without the IETF&#39;s or the Unicode Consortium&#39;s permission. But such a browser would be very quickly out of business, for obvious reasons.<br>

</blockquote></div></blockquote><div><br>That is given as an extreme case of what is possible under the spec. While I agree that that is unlikely, since conformant implementations of IDNA2008 have complete freedom, it is not unlikely that we would see an array of more subtle interoperability problems resulting. <br>

<br>That being said, I agree that it is probably best to just remove that line.<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

Again in 1.3.2, it says &quot;but adds validity constraints from IDNA2008&quot;, but then gives &quot;http://√.com&quot; as an okay example (currently in use, although for domain speculation only), which I&#39;d assume is prohibited in IDNA2008 based on the LDH-equivalence rules.<br>

</blockquote></div></blockquote><div><br>It should be: .. adds bidi validity constraints...<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

(Btw, I&#39;d suggest you remove the links from (most of) your examples, because you shouldn&#39;t at the same time claim that there is potential for phishing and make it easy to happen. Another issue is that some of these links don&#39;t actually resolve, but they look like the should. (e.g. <a href="http://I" target="_blank">http://I</a>♥NY.com))<br>

</blockquote></div></blockquote><div><br>Agreed. That is already a TODO mentioned at the top of the document.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

I don&#39;t really like the idea of Compatible Preprocessing (section 1.4) at all. Bypassing IDNA2008 lookup by converting to punycode separately is really going too far. I thought the intent of the document was to use either IDNA2003 or IDNA2008, not to simulate IDNA2003 on top of IDNA2008 at all costs by an additional layer.<br>

</blockquote></div></blockquote><div><br>By applying 1.4, you get nearly the same effect as &quot;try IDNA2008 then try IDNA2003&quot;. That allows browsers and other clients (including us at Google) to have a single processing step, without having to maintain two different implementations.<br>

<br>The draft had earlier a &quot;hybrid&quot; option, whereby the characters were limited to those accepted by IDNA2008. To my surprise, at the last meeting the consensus was to drop that.<br><br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

Section 3, Preprocessing: &quot;(For more about the parts of a URL, including the domain name, see [RFC3987]).&quot; I don&#39;t know why RFC 3987 is relevant here. It may be misunderstood in that the processing is applied to the whole IRI/URI. Also, RFC 3987 doesn&#39;t actually define &quot;domain name&quot;, nor does it say which parts (of which it mentions several) of an IRI are domain names.<br>

</blockquote></div></blockquote><div><br>Thanks, that is not a good reference. What would you recommend? <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

In Section 3, Preprocessing, things such as &quot;URI/IRI %-escapes like %2e for U+002E (.) FULL STOP.&quot;</blockquote></div></blockquote><div> </div><div>That is only in an illustrative section, and not required. It is describing what is actually done:  the browsers accept %xx escapes. However, %2e is probably not a good example here: changed to <br>

... %C3%A0 for U+00E0 ( à ) LATIN SMALL LETTER A WITH GRAVE. <br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote">

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">or &quot;U+2488 ( ⒈ ) DIGIT ONE FULL STOP&quot; </blockquote></div></blockquote><div> <br>This is a tricky case. It has been included for compatibility. Both FF and IE support this behavior, where the domain name is separated into labels <i>after</i> normalization, while Safari and Chrome separate <i>before.</i> And FF, Safari and Chrome both interpret google%2ecom as &quot;<a href="http://a.com">a.com</a>&quot; (haven&#39;t checked IE).<br>

<br>I put in a review note, and we can run it by the browser representatives. (For my part, I think it would be cleaner to break into labels at full

stops (normal &amp; fullwidth, ideographic), and not at all characters

that have . in their decomposition.)<br><br><div style="margin-left: 40px;">[Review note: this behavior allows characters whose decompositions contain a dot and other characters. It is included because it represents the predominant browser behavior (both FF and IE). Similarly, current browsers interpret &quot;google%2Ecom&quot; as &quot;<a href="http://google.com">google.com</a>&quot;. Is there good reason to change this behavior?]<br>

</div><br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

are very similar in my eyes to overlong UTF-8 sequences and such, and therefore have a high potential for security problems. </blockquote></div></blockquote><div><br>There are issues with this kind of parsing, although in an analysis of

URLs that actually spoof other sites, these characters did not show up. <br></div><div><br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">People who have to use %2e or U+2488 shoot themselves in their foot, and should feel it sooner rather than later. I cannot understand why this document claims to &quot;avoid... security problems&quot; (in the abstract) and then promotes this kind of stuff. As an aside, RFC 3986 recommends %2E in preference to %2e.<br>

</blockquote></div></blockquote><div><br>As above, it is reflecting current browser behavior. I can run it by the browser folk.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

Clicking on <a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:toNFKC=/" target="_blank">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:toNFKC=/</a>\./:], I get some kind of Java exception report at<br>


<a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B:toNFKC=/%5C./:%5D" target="_blank">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B:toNFKC=/%5C./:%5D</a><br></blockquote></div></blockquote><div>

<br>Sorry, there was a recent change in the libraries that caused that to malfunction. It should work now, as should the <a href="http://unicode.org/cldr/utility/idna.jsp">http://unicode.org/cldr/utility/idna.jsp</a> <br>

</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

Some of the steps in Section 3 are completely cryptic. As an example, what does &quot;trusted source&quot; in &quot;If any label is in Punycode, and does not come from a trusted source&quot; mean? </blockquote></div></blockquote>

<div> </div><div>Sorry, that was a remnant from an earlier version. Removed. Also added a review note:<br><br><div style="margin-left: 40px;">[Review note: this could be rewritten for clarity as a step 3a: &quot;Convert any Punycode labels back to Unicode&quot;, with explanation of what a Punycode lable is, and aborting with an error if such conversion fails.]<br>

</div>  <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

What does &quot;validity criteria&quot; in &quot;Abort with error if the label does not comply with the validity criteria&quot; mean? (a pointer to section 5 would help)<br></blockquote></div></blockquote><div><br>There was one in the main line #4, but added another reference for clarity. <br>

</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

Also, in Step 3, you split, which means that in Step 5, you have several strings, but you only return one string ?!<br></blockquote></div></blockquote><div><br>Changed to &quot;the domain_name resulting from Step 2&quot;. The splitting into labels is only to apply validity checks.<br>

 <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

As for mapping tables, what seems to happen is that e.g. a &quot;ß&quot; is mapped to &quot;ss&quot; for lookup, but not for display. This seems to be really a bad combination: You pretend that the browser (or whatever) distinguishes between &quot;ss&quot; and &quot;ß&quot;, but redirect to &quot;ss&quot;. From a point of view of a search engine, that may be the right thing to do, but assuming that .de allows separate registrations in those cases where there is a real difference between &quot;ss&quot; and &quot;ß&quot;, which I hope they will, the above will be the worst of both worlds. If my name is Straßen, and I own <a href="http://strassen.de" target="_blank">straßen.de</a>, and somebody else owns <a href="http://strassen.de" target="_blank">strassen.de</a> because his/her name is Strassen.de, then we both want to be able to make sure people get to the right place, at least once IDNA2008 is deployed.<br>

</blockquote></div></blockquote><div><br>This is a real issue. There was a long discussion at the UTC, and this was felt to be a way out of the even worse problem of indeterminacy of labels containing ß, final sigma, and especially joiners -- with different browsers, or different versions of browsers, going to different IP addresses with the same domain name. The problem is that for years to come, a huge number of browsers will not support 2008 (look at how many people are still using IE6). If we could wave a magic wand and change all implementations at once to use the new scheme to use, it might be feasible; otherwise...<br>

<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

Section 5: &quot;[Review Note: Once IDNA2008 is final, the exact specifications can be substituted for the last two bullets, making the above self-contained.]&quot;: Doing textual substitution in these cases is a really bad idea. Please keep the pointers.<br>

</blockquote></div></blockquote><div><br>Thanks. I noticed that #5 really shouldn&#39;t be there. As a normative change, I can only add a review note for that.<br><br>For the review note, I modified to be <br><br><div style="margin-left: 40px;">

[Review Note: A previous review note suggested that once IDNA2008 is final, the exact specification be substituted for the last bullet. However, it would probably be best to retain the pointer. It does raise another issue, of whether the BIDI spec should be part of the validity test or not: IDNA2008 doesn&#39;t require it in clients.] <br>

</div><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

5.1: &quot;Remove block description characters&quot; -&gt; &quot;Remove ideographic description characters&quot; or &quot;&quot;Remove ideographic description block&quot;<br></blockquote></div></blockquote><div><br>done <br>

<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

5.1: I don&#39;t understand how &quot;+ [\u002D]&quot; can add back all valid ASCII.<br></blockquote></div></blockquote><div> </div><div>Typo, added review note. <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

5.2: This description doesn&#39;t help at all. What I think a reader would want to know here is how these two sets differ, not some regexp notation of yet another set.<br></blockquote></div></blockquote><div><br>Good point. I added an editorial note to add a table of example differences. <br>

</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

Section 8, Tactics: The title should change, maybe &quot;Background&quot; might work. </blockquote></div></blockquote><div><br>Good point; that title was left over from an earlier version.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">It is completely unbelievable that the Unicode consortium would claim, in one of their TRs (in particular one that seems to be headed for &quot;Technical Standard&quot;), that the difference between &quot;ss&quot; and &quot;ß&quot; is essentially a display issue. Overall, this section just repeats stuff in the other sections.<br>

</blockquote></div></blockquote><div><br>That is unfortunate wording. The conclusion wasn&#39;t really that this difference was only display, but that the most important feature of the difference was the display. Modified.<br>

 <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

Section 9, first question: The entries in the table need an explanation.<br></blockquote></div></blockquote><div><br>Added some text. <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

Section 9, advantages of IDNA2008: Yes, please keep that, it helps a reader getting a balanced overview.<br>

<br>

Section 9, disadvantages of IDNA2008, you say &quot;More fragile in that future Unicode versions require a manual step to avoid instabilities&quot;. I don&#39;t understand that.<br></blockquote></div></blockquote><div><br>

If Unicode version X changes properties in such a way as to add or remove characters from PVALID, it requires a manual step to retain the previous status. That step could have been avoided in the formulation, but wasn&#39;t.<br>

<br>Added a note.<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

Section 9, bidi label hopping: quotes on no or both sides.<br></blockquote></div></blockquote><div><br>good.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

Section 9, &quot;Are the &quot;local&quot; mappings just a UI issue?&quot;: This seems to imply that &quot;<a href="http://xn--trkye-kva78a.com" target="_blank">http://türkıye.com</a>&quot; and &quot;<a href="http://xn--trkiye-3ya.com" target="_blank">http://türkiye.com</a>&quot; are different under IDNA2008. In my view, this would be great. </blockquote>

</div></blockquote><div><br>&quot;<a href="http://xn--trkye-kva78a.com/" target="_blank">http://türkıye.com</a>&quot; and &quot;<a href="http://xn--trkiye-3ya.com/" target="_blank">http://türkiye.com</a>&quot; are different under *both* IDNA2008 and IDNA2003. The problem is with &lt;a href=&quot;<a href="http://xn--trkiye-3ya.COM">TÜRKIYE.COM</a>&quot;&gt;...&lt;/a&gt;. You really don&#39;t want one browser going to <a href="http://xn--trkye-kva78a.com/" target="_blank">http://türkıye.com</a> and a different browser going to <a href="http://xn--trkiye-3ya.com/" target="_blank">http://türkiye.com</a>.<br>

<br>See<br><a href="http://unicode.org/cldr/utility/idna.jsp?a=t%C3%BCrk">http://unicode.org/cldr/utility/idna.jsp?a=türk</a>ı<a href="http://ye.com">ye.com</a>+<a href="http://xn--trkiye-3ya.com">türkiye.com</a>+<a href="http://xn--trkiye-3ya.COM">TÜRKIYE.COM</a><br>

 <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Can somebody confirm/deny? (the i/ı issue is in my view almost the only justification for having custom mappings). [If denied, you have to remove the examples.]<br>


<br>

Also, the answers of the type &quot;Bob clicks on the link, and goes to a bad site.&quot; should be changed to &quot;Bob clicks on the link, and doesn&#39;t find any site, or goes to a wrong (and potentially malicious) site.<br>

</blockquote></div></blockquote><div><br>That is reasonable.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote">

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

Also, isn&#39;t the idea of IDNA2008 to get people to only use lower case, among else? And shouldn&#39;t browsers lowercase domain names in their address field, too (as they already do with ASCII-only ones)?<br></blockquote>

</div></blockquote><div><br>IDNA2008 references an mapping that lowercases, but it is optional. The browsers do lowercase (actually show the transformed - NFKC-CaseFolded version) in the address bar.<br><br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

Also, the relationship between &quot;It is generally understood at the W3C that all attributes that take URLs should take full IRIs, not punycoded-URIs, so for example SVG, MathML, XLink, XML, etc, all take IRIs now, as does HTML5.&quot; and its main point isn&#39;t clear to me.<br>

</blockquote></div></blockquote><div><br>That isn&#39;t actually clear to me why it is here either. It was due to someone&#39;s previous comment, but it looks out of place. Added a note.<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

<br>

The whole document needs careful editing/proofreeding before publication (e.g. map map in Section 9).<br></blockquote></div></blockquote><div><br>Yes. I sent an earlier message about the process; the goal was to get the content out for review, and follow on later with editing.<br>

 <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

Section 9, &quot;why does IDNA2003 map map final sigma (ς) to sigma (σ), map eszett (ß) to &quot;ss&quot;, and delete ZWJ/ZWNJ?&quot;: This is trying to beautify things after the fact. What happened when these were decided upon was that the IETF was looking for a table (they didn&#39;t want to create their own, because in that specific WG, that would have opened all the doors for weird script-specific requests), and the Unicode Consortium had a table (what&#39;s now in NFKC_CaseFold), and so that was taken. </blockquote>

</div></blockquote><div><br>That was supposed to be conveyed by the phrasing &quot;following the Unicode Standard&quot;. Rereading it, I can see how it would strike you as it did. I don&#39;t think that your interpretation is exactly right either, because we did have tables that simply normalized, and the working group could have chosen them.<br>

 </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

There is absolutely no need for domain names to be fully case-insensitive with transitivity and round-trips. </blockquote></div></blockquote><div><br>Whether or not transitively is a requirement is a matter of some dispute. Round-tripping isn&#39;t mentioned and wasn&#39;t a goal. Moreover, roundtripping with casing is impossible anyway: &quot;McGowan&quot;, once transformed by casemapping, cannot be restored.<br>

 <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

In some sense, ß and ς are indeed anomalous, but they are full parts of the orthographies of the respective languages, and at least the former is distinguishing, in particular for names.<br></blockquote></div></blockquote>

<div><br>Nobody is claiming that they are not full parts of the orthographies. However, for neither IDNA2003 nor IDNA2008 is it claimed that <b>all</b> parts of every language&#39;s orthographies, and all the distinctions therein are representable in domain names. There are trivial examples even in English, like &quot;can&#39;t&quot; vs &quot;cant&quot;, which cannot be represented.<br>

 <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

&quot;The rough consensus among the working group&quot;: Which WG?<br></blockquote></div></blockquote><div>IETF IDNA <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

Overall, my impression is that this document isn&#39;t yet ready for approval.<br>

<br>

[Overall, my feeling is that some of the text in this document (not all of it, of course) must feel quite a bit similar (to IETF people) to some of the text in (earlier versions? I haven&#39;t had time to read any recent versions) of the Rationale document or some earlier documents, in particular in some draft stages, on the IETF side, that the Unicode side didn&#39;t like.]<br>

</blockquote></div></blockquote><div><br>I think it does reflect some concerns that the Unicode had. It also represents discussion with browser vendors as to what is feasible. Over time, once IDNA2008 is fully deployed in registries, then the compatibility &quot;shim&quot; provided by this specification would hopefully become unnecessary.<br>

 <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<br>

<br>

Regards,   Martin.<div><div></div><div><br>

<br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

On Mon, Oct 5, 2009 at 00:43, Harald Alvestrand&lt;<a href="mailto:harald@alvestrand.no" target="_blank">harald@alvestrand.no</a>&gt;wrote:<br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Mark Davis ☕ wrote:<br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


The IESG may encounter implementation tactics for dealing with the old<br>

</blockquote>

and new specifications that are controversial.<br>

<br>

One set of implementation tactics is UTS#46 Unicode IDNA Compatible<br>

Preprocessing&lt;<a href="http://www.unicode.org/reports/tr46/" target="_blank">http://www.unicode.org/reports/tr46/</a>&gt;  (in draft).<br>

<br>

</blockquote>

Yes, that&#39;s one of the controversial ones.<br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

The UTC will be considering that for approval at its upcoming meeting, so<br>

people with concerns may want to discuss and submit them to the UTC.<br>

<br>

Mark<br>

</blockquote></blockquote></blockquote>

<br></div></div><font color="#888888">

-- <br>

#-# Martin J. Dürst, Professor, Aoyama Gakuin University<br>

#-# <a href="http://www.sw.it.aoyama.ac.jp" target="_blank">http://www.sw.it.aoyama.ac.jp</a>   mailto:<a href="mailto:duerst@it.aoyama.ac.jp" target="_blank">duerst@it.aoyama.ac.jp</a><br>

</font></blockquote></div><br>

</blockquote></div><br>