TR46 (was Re: Formal submission of our documents to AD)

Mark Davis ☕ mark at macchiato.com
Thu Oct 8 01:42:29 CEST 2009


Martin,

I updated the utility to show the difference between IDNA2003, 2008, and
TR46:

http://unicode.org/cldr/utility/idna.jsp

Will follow on with responses to your comments, and Vint's request for a
flow diagram.

Mark


On Mon, Oct 5, 2009 at 04:01, "Martin J. Dürst" <duerst at it.aoyama.ac.jp>wrote:

> On 2009/10/05 17:03, Mark Davis ☕ wrote:
> > If you have some particular suggestions regarding items in the document,
> you
> > can submit them directly via the
> > *[Feedback<http://www.unicode.org/reports/tr46/#Feedback>
> > ]* link at the top. If you also want discussion of the topics, then also
> on
> > one of the Unicode mailing lists. And if you have any other suggestions
> for
> > how to bridge the compatibility gaps between IDNA2003 implementations and
> > IDNA2008 implementations, those suggestions would be welcome. We had a
> > number of people from the browser communities at the meetings, and these
> > were the best we could come up as yet.
> > *
> > *Mark
>
> Hello Mark,
>
> Just some short comments here on this list while rushing through that
> document. Please forward these wherever appropriate.
>
> 1.3.1, Deviations, says "There are a few situations where the strict
> application of IDNA2008 will *always* result in the resolution of IDNs to
> different IP addresses than in IDNA2003."
> The *always* is of course wrong. The document itself later says "Unless the
> "DE" registry bundles", which is, as far as we know, more or less what they
> are going to do. The other possibility is of course that the owner of the
> domain name in question makes sure they get both variants. For the business
> example that you show, that's not a problem at all, if the business knows
> about the issue.
>
> In 1.3.2, Example 3. "Map http://ÖBB.at <http://xn--bb-eka.at> to http:/
> phishing.com", is completely weird. If any browser or similar device wants
> to spoof their users, they have always been able to do this, even without
> the IETF's or the Unicode Consortium's permission. But such a browser would
> be very quickly out of business, for obvious reasons.
>
> Again in 1.3.2, it says "but adds validity constraints from IDNA2008", but
> then gives "http://√.com" as an okay example (currently in use, although
> for domain speculation only), which I'd assume is prohibited in IDNA2008
> based on the LDH-equivalence rules.
>
> (Btw, I'd suggest you remove the links from (most of) your examples,
> because you shouldn't at the same time claim that there is potential for
> phishing and make it easy to happen. Another issue is that some of these
> links don't actually resolve, but they look like the should. (e.g.
> http://I♥NY.com))
>
> I don't really like the idea of Compatible Preprocessing (section 1.4) at
> all. Bypassing IDNA2008 lookup by converting to punycode separately is
> really going too far. I thought the intent of the document was to use either
> IDNA2003 or IDNA2008, not to simulate IDNA2003 on top of IDNA2008 at all
> costs by an additional layer.
>
> Section 3, Preprocessing: "(For more about the parts of a URL, including
> the domain name, see [RFC3987])." I don't know why RFC 3987 is relevant
> here. It may be misunderstood in that the processing is applied to the whole
> IRI/URI. Also, RFC 3987 doesn't actually define "domain name", nor does it
> say which parts (of which it mentions several) of an IRI are domain names.
>
> In Section 3, Preprocessing, things such as "URI/IRI %-escapes like %2e for
> U+002E (.) FULL STOP." or "U+2488 ( ⒈ ) DIGIT ONE FULL STOP" are very
> similar in my eyes to overlong UTF-8 sequences and such, and therefore have
> a high potential for security problems. People who have to use %2e or U+2488
> shoot themselves in their foot, and should feel it sooner rather than later.
> I cannot understand why this document claims to "avoid... security problems"
> (in the abstract) and then promotes this kind of stuff. As an aside, RFC
> 3986 recommends %2E in preference to %2e.
>
> Clicking on
> http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:toNFKC=/\./:], I
> get some kind of Java exception report at
> http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B:toNFKC=/%5C./:%5D
>
> Some of the steps in Section 3 are completely cryptic. As an example, what
> does "trusted source" in "If any label is in Punycode, and does not come
> from a trusted source" mean? What does "validity criteria" in "Abort with
> error if the label does not comply with the validity criteria" mean? (a
> pointer to section 5 would help)
>
> Also, in Step 3, you split, which means that in Step 5, you have several
> strings, but you only return one string ?!
>
> As for mapping tables, what seems to happen is that e.g. a "ß" is mapped to
> "ss" for lookup, but not for display. This seems to be really a bad
> combination: You pretend that the browser (or whatever) distinguishes
> between "ss" and "ß", but redirect to "ss". From a point of view of a search
> engine, that may be the right thing to do, but assuming that .de allows
> separate registrations in those cases where there is a real difference
> between "ss" and "ß", which I hope they will, the above will be the worst of
> both worlds. If my name is Straßen, and I own straßen.de<http://strassen.de>,
> and somebody else owns strassen.de because his/her name is Strassen.de,
> then we both want to be able to make sure people get to the right place, at
> least once IDNA2008 is deployed.
>
> Section 5: "[Review Note: Once IDNA2008 is final, the exact specifications
> can be substituted for the last two bullets, making the above
> self-contained.]": Doing textual substitution in these cases is a really bad
> idea. Please keep the pointers.
>
> 5.1: "Remove block description characters" -> "Remove ideographic
> description characters" or ""Remove ideographic description block"
>
> 5.1: I don't understand how "+ [\u002D]" can add back all valid ASCII.
>
> 5.2: This description doesn't help at all. What I think a reader would want
> to know here is how these two sets differ, not some regexp notation of yet
> another set.
>
> Section 8, Tactics: The title should change, maybe "Background" might work.
> It is completely unbelievable that the Unicode consortium would claim, in
> one of their TRs (in particular one that seems to be headed for "Technical
> Standard"), that the difference between "ss" and "ß" is essentially a
> display issue. Overall, this section just repeats stuff in the other
> sections.
>
> Section 9, first question: The entries in the table need an explanation.
>
> Section 9, advantages of IDNA2008: Yes, please keep that, it helps a reader
> getting a balanced overview.
>
> Section 9, disadvantages of IDNA2008, you say "More fragile in that future
> Unicode versions require a manual step to avoid instabilities". I don't
> understand that.
>
> Section 9, bidi label hopping: quotes on no or both sides.
>
> Section 9, "Are the "local" mappings just a UI issue?": This seems to imply
> that "http://türkıye.com <http://xn--trkye-kva78a.com>" and "
> http://türkiye.com <http://xn--trkiye-3ya.com>" are different under
> IDNA2008. In my view, this would be great. Can somebody confirm/deny? (the
> i/ı issue is in my view almost the only justification for having custom
> mappings). [If denied, you have to remove the examples.]
>
> Also, the answers of the type "Bob clicks on the link, and goes to a bad
> site." should be changed to "Bob clicks on the link, and doesn't find any
> site, or goes to a wrong (and potentially malicious) site.
>
> Also, isn't the idea of IDNA2008 to get people to only use lower case,
> among else? And shouldn't browsers lowercase domain names in their address
> field, too (as they already do with ASCII-only ones)?
>
> Also, the relationship between "It is generally understood at the W3C that
> all attributes that take URLs should take full IRIs, not punycoded-URIs, so
> for example SVG, MathML, XLink, XML, etc, all take IRIs now, as does HTML5."
> and its main point isn't clear to me.
>
>
> The whole document needs careful editing/proofreeding before publication
> (e.g. map map in Section 9).
>
> Section 9, "why does IDNA2003 map map final sigma (ς) to sigma (σ), map
> eszett (ß) to "ss", and delete ZWJ/ZWNJ?": This is trying to beautify things
> after the fact. What happened when these were decided upon was that the IETF
> was looking for a table (they didn't want to create their own, because in
> that specific WG, that would have opened all the doors for weird
> script-specific requests), and the Unicode Consortium had a table (what's
> now in NFKC_CaseFold), and so that was taken. There is absolutely no need
> for domain names to be fully case-insensitive with transitivity and
> round-trips. In some sense, ß and ς are indeed anomalous, but they are full
> parts of the orthographies of the respective languages, and at least the
> former is distinguishing, in particular for names.
>
> "The rough consensus among the working group": Which WG?
>
> Overall, my impression is that this document isn't yet ready for approval.
>
> [Overall, my feeling is that some of the text in this document (not all of
> it, of course) must feel quite a bit similar (to IETF people) to some of the
> text in (earlier versions? I haven't had time to read any recent versions)
> of the Rationale document or some earlier documents, in particular in some
> draft stages, on the IETF side, that the Unicode side didn't like.]
>
>
> Regards,   Martin.
>
>
>
>  On Mon, Oct 5, 2009 at 00:43, Harald Alvestrand<harald at alvestrand.no
>> >wrote:
>>
>>  Mark Davis ☕ wrote:
>>>
>>>  The IESG may encounter implementation tactics for dealing with the old
>>>>>
>>>> and new specifications that are controversial.
>>>>
>>>> One set of implementation tactics is UTS#46 Unicode IDNA Compatible
>>>> Preprocessing<http://www.unicode.org/reports/tr46/>  (in draft).
>>>>
>>>>  Yes, that's one of the controversial ones.
>>>
>>>  The UTC will be considering that for approval at its upcoming meeting,
>>>> so
>>>> people with concerns may want to discuss and submit them to the UTC.
>>>>
>>>> Mark
>>>>
>>>
> --
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20091007/4cf5dfe0/attachment-0001.htm 


More information about the Idna-update mailing list