Formal submission of our documents to AD

Mark Davis ☕ mark at macchiato.com
Tue Oct 6 06:10:08 CEST 2009


Good comments, as always; thanks. I won't have time to respond in the next
couple of days, but will try to shortly thereafter.

A quick note on the process: it is not the same as IETF process. The
document, as you note, is not anywhere ready for publication. Editorial
changes can be incorporated the public draft after approval by the editorial
committee. Substantive issues can be noted in the draft for review, but take
UTC decision to change one way or another. If the draft is approved at the
UTC, even at that point it would not necessarily be in final form -
editorial work can continue before release, again subject to approval by the
editorial committee. Or the UTC could decide to have another cycle, or take
other action.

Mark


On Mon, Oct 5, 2009 at 04:01, "Martin J. Dürst" <duerst at it.aoyama.ac.jp>wrote:

> On 2009/10/05 17:03, Mark Davis ☕ wrote:
> > If you have some particular suggestions regarding items in the document,
> you
> > can submit them directly via the
> > *[Feedback<http://www.unicode.org/reports/tr46/#Feedback>
> > ]* link at the top. If you also want discussion of the topics, then also
> on
> > one of the Unicode mailing lists. And if you have any other suggestions
> for
> > how to bridge the compatibility gaps between IDNA2003 implementations and
> > IDNA2008 implementations, those suggestions would be welcome. We had a
> > number of people from the browser communities at the meetings, and these
> > were the best we could come up as yet.
> > *
> > *Mark
>
> Hello Mark,
>
> Just some short comments here on this list while rushing through that
> document. Please forward these wherever appropriate.
>
> 1.3.1, Deviations, says "There are a few situations where the strict
> application of IDNA2008 will *always* result in the resolution of IDNs to
> different IP addresses than in IDNA2003."
> The *always* is of course wrong. The document itself later says "Unless the
> "DE" registry bundles", which is, as far as we know, more or less what they
> are going to do. The other possibility is of course that the owner of the
> domain name in question makes sure they get both variants. For the business
> example that you show, that's not a problem at all, if the business knows
> about the issue.
>
> In 1.3.2, Example 3. "Map http://ÖBB.at <http://xn--bb-eka.at> to http:/
> phishing.com", is completely weird. If any browser or similar device wants
> to spoof their users, they have always been able to do this, even without
> the IETF's or the Unicode Consortium's permission. But such a browser would
> be very quickly out of business, for obvious reasons.
>
> Again in 1.3.2, it says "but adds validity constraints from IDNA2008", but
> then gives "http://√.com" as an okay example (currently in use, although
> for domain speculation only), which I'd assume is prohibited in IDNA2008
> based on the LDH-equivalence rules.
>
> (Btw, I'd suggest you remove the links from (most of) your examples,
> because you shouldn't at the same time claim that there is potential for
> phishing and make it easy to happen. Another issue is that some of these
> links don't actually resolve, but they look like the should. (e.g.
> http://I♥NY.com))
>
> I don't really like the idea of Compatible Preprocessing (section 1.4) at
> all. Bypassing IDNA2008 lookup by converting to punycode separately is
> really going too far. I thought the intent of the document was to use either
> IDNA2003 or IDNA2008, not to simulate IDNA2003 on top of IDNA2008 at all
> costs by an additional layer.
>
> Section 3, Preprocessing: "(For more about the parts of a URL, including
> the domain name, see [RFC3987])." I don't know why RFC 3987 is relevant
> here. It may be misunderstood in that the processing is applied to the whole
> IRI/URI. Also, RFC 3987 doesn't actually define "domain name", nor does it
> say which parts (of which it mentions several) of an IRI are domain names.
>
> In Section 3, Preprocessing, things such as "URI/IRI %-escapes like %2e for
> U+002E (.) FULL STOP." or "U+2488 ( ⒈ ) DIGIT ONE FULL STOP" are very
> similar in my eyes to overlong UTF-8 sequences and such, and therefore have
> a high potential for security problems. People who have to use %2e or U+2488
> shoot themselves in their foot, and should feel it sooner rather than later.
> I cannot understand why this document claims to "avoid... security problems"
> (in the abstract) and then promotes this kind of stuff. As an aside, RFC
> 3986 recommends %2E in preference to %2e.
>
> Clicking on
> http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:toNFKC=/\./:], I
> get some kind of Java exception report at
> http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B:toNFKC=/%5C./:%5D
>
> Some of the steps in Section 3 are completely cryptic. As an example, what
> does "trusted source" in "If any label is in Punycode, and does not come
> from a trusted source" mean? What does "validity criteria" in "Abort with
> error if the label does not comply with the validity criteria" mean? (a
> pointer to section 5 would help)
>
> Also, in Step 3, you split, which means that in Step 5, you have several
> strings, but you only return one string ?!
>
> As for mapping tables, what seems to happen is that e.g. a "ß" is mapped to
> "ss" for lookup, but not for display. This seems to be really a bad
> combination: You pretend that the browser (or whatever) distinguishes
> between "ss" and "ß", but redirect to "ss". From a point of view of a search
> engine, that may be the right thing to do, but assuming that .de allows
> separate registrations in those cases where there is a real difference
> between "ss" and "ß", which I hope they will, the above will be the worst of
> both worlds. If my name is Straßen, and I own straßen.de<http://strassen.de>,
> and somebody else owns strassen.de because his/her name is Strassen.de,
> then we both want to be able to make sure people get to the right place, at
> least once IDNA2008 is deployed.
>
> Section 5: "[Review Note: Once IDNA2008 is final, the exact specifications
> can be substituted for the last two bullets, making the above
> self-contained.]": Doing textual substitution in these cases is a really bad
> idea. Please keep the pointers.
>
> 5.1: "Remove block description characters" -> "Remove ideographic
> description characters" or ""Remove ideographic description block"
>
> 5.1: I don't understand how "+ [\u002D]" can add back all valid ASCII.
>
> 5.2: This description doesn't help at all. What I think a reader would want
> to know here is how these two sets differ, not some regexp notation of yet
> another set.
>
> Section 8, Tactics: The title should change, maybe "Background" might work.
> It is completely unbelievable that the Unicode consortium would claim, in
> one of their TRs (in particular one that seems to be headed for "Technical
> Standard"), that the difference between "ss" and "ß" is essentially a
> display issue. Overall, this section just repeats stuff in the other
> sections.
>
> Section 9, first question: The entries in the table need an explanation.
>
> Section 9, advantages of IDNA2008: Yes, please keep that, it helps a reader
> getting a balanced overview.
>
> Section 9, disadvantages of IDNA2008, you say "More fragile in that future
> Unicode versions require a manual step to avoid instabilities". I don't
> understand that.
>
> Section 9, bidi label hopping: quotes on no or both sides.
>
> Section 9, "Are the "local" mappings just a UI issue?": This seems to imply
> that "http://türkıye.com <http://xn--trkye-kva78a.com>" and "
> http://türkiye.com <http://xn--trkiye-3ya.com>" are different under
> IDNA2008. In my view, this would be great. Can somebody confirm/deny? (the
> i/ı issue is in my view almost the only justification for having custom
> mappings). [If denied, you have to remove the examples.]
>
> Also, the answers of the type "Bob clicks on the link, and goes to a bad
> site." should be changed to "Bob clicks on the link, and doesn't find any
> site, or goes to a wrong (and potentially malicious) site.
>
> Also, isn't the idea of IDNA2008 to get people to only use lower case,
> among else? And shouldn't browsers lowercase domain names in their address
> field, too (as they already do with ASCII-only ones)?
>
> Also, the relationship between "It is generally understood at the W3C that
> all attributes that take URLs should take full IRIs, not punycoded-URIs, so
> for example SVG, MathML, XLink, XML, etc, all take IRIs now, as does HTML5."
> and its main point isn't clear to me.
>
>
> The whole document needs careful editing/proofreeding before publication
> (e.g. map map in Section 9).
>
> Section 9, "why does IDNA2003 map map final sigma (ς) to sigma (σ), map
> eszett (ß) to "ss", and delete ZWJ/ZWNJ?": This is trying to beautify things
> after the fact. What happened when these were decided upon was that the IETF
> was looking for a table (they didn't want to create their own, because in
> that specific WG, that would have opened all the doors for weird
> script-specific requests), and the Unicode Consortium had a table (what's
> now in NFKC_CaseFold), and so that was taken. There is absolutely no need
> for domain names to be fully case-insensitive with transitivity and
> round-trips. In some sense, ß and ς are indeed anomalous, but they are full
> parts of the orthographies of the respective languages, and at least the
> former is distinguishing, in particular for names.
>
> "The rough consensus among the working group": Which WG?
>
> Overall, my impression is that this document isn't yet ready for approval.
>
> [Overall, my feeling is that some of the text in this document (not all of
> it, of course) must feel quite a bit similar (to IETF people) to some of the
> text in (earlier versions? I haven't had time to read any recent versions)
> of the Rationale document or some earlier documents, in particular in some
> draft stages, on the IETF side, that the Unicode side didn't like.]
>
>
> Regards,   Martin.
>
>
>
>  On Mon, Oct 5, 2009 at 00:43, Harald Alvestrand<harald at alvestrand.no
>> >wrote:
>>
>>  Mark Davis ☕ wrote:
>>>
>>>  The IESG may encounter implementation tactics for dealing with the old
>>>>>
>>>> and new specifications that are controversial.
>>>>
>>>> One set of implementation tactics is UTS#46 Unicode IDNA Compatible
>>>> Preprocessing<http://www.unicode.org/reports/tr46/>  (in draft).
>>>>
>>>>  Yes, that's one of the controversial ones.
>>>
>>>  The UTC will be considering that for approval at its upcoming meeting,
>>>> so
>>>> people with concerns may want to discuss and submit them to the UTC.
>>>>
>>>> Mark
>>>>
>>>
> --
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20091005/f58fcc4b/attachment-0001.htm 


More information about the Idna-update mailing list