Making progress on the mapping question
vint at google.com
Tue Mar 31 15:17:14 CEST 2009
We are not going to revisit this for the Nth time. The WG long ago
concluded to drop these symbols from IDNA2008 and nothing has changed.
1818 Library Street, Suite 400
Reston, VA 20190
vint at google.com
On Mar 30, 2009, at 4:47 PM, John C Klensin wrote:
> While the consensus has certainly been rough and a few people
> have regularly expressed doubts as to whether it is worth the
> effort, my sense is that we agreed long ago on an
> inclusion-based approach based on extending the
> letter-digit-hyphen model rather than including arbitrary
> symbols and punctuation because they are there. Insofar as the
> TLD registries are concerned, this is one of the areas in which
> many of them are explicitly aware that those who have registered
> these strings are going to have to put in some transition effort
> and probably take some heat. It is clearly up to Vint but,
> speaking for myself only, I don't know how many more times we
> should need to go through that discussion.
> To review some of the general concern about symbols, consider
> your example in context with
> (i♡blogging.blogspot.com) and
> (i❤blogging.blogspot.com). I don't know how the typical
> user remembers which of these to use or distinguishes among
> them, nor how registration lookup databases ("whois" etc.) refer
> to them, etc. It might be possible to sort some of those issues
> out, but it would require considerable effort and probably a
> good deal of user education.
> I don't underestimate the support call issue if these more or
> less suddenly stop working, but I suggest we have been through
> it before. Users who shifted from IE6+plugin to IE7 saw some
> IDNs disappear into ACE form because of the languages they had
> enabled. Users who were successfully using early Firefox
> implementations of IDNA saw native-character strings disappear
> with upgrades when those domain names were rooted in TLDs whom
> Firefox didn't think took adequate precautions to avoid phishing
> and confusing strings. There has never been a recommendation to
> use symbols in domain name labels: the IESG note, the various
> versions of ICANN guidelines, and other sources of advice have
> all focused on the characters that are used to write words in
> languages, and these symbols are not used that way.
> It seems to me that you will need an FAQ or knowledge base
> article that says, in essence:
> Q: My domain name which contains hearts, spades, or
> other symbols, stopped working in the most recent update
> of the browser. Why?
> A: Use of these characters in domain labels was never
> recommended, although some systems permitted them to be
> registered and used. A new standard has been adopted
> that explicitly restricts identifiers to the letters and
> digits used to write the languages of the world and we
> have changed the implementation to conform to that
> Note, too, that this is not really a mapping issue. U+2665 is
> not changed by Stringprep. I just got the details wrong when I
> sketched out that appendix in Protocol as "if IDNA2008 doesn't
> produce a resolvable name, use IDNA2003".
> --On Monday, March 30, 2009 19:35 +0000 "Shawn Steele (???)"
> <Shawn.Steele at microsoft.com> wrote:
>> -----Original Message-----
>> From: idna-update-bounces at alvestrand.no
>> [mailto:idna-update-bounces at alvestrand.no] On Behalf Of John C
>> Klensin Sent: Pōʻakahi, Malaki 30, 2009 8:20 AM
>> To: Vint Cerf; idna-update at alvestrand.no
>> Subject: Re: Making progress on the mapping question
>>> (c) The above would imply that we apply _all_ IDNA2003
>>> mappings and lookups if the IDNA2008 lookup of (1) fails. I
>>> do not believe that is actually our intent. Consider the
>>> string "┌┐└┘" (U+250C U+2510 U+2514 U+2518). An
>>> IDNA2008 conversion fails completely because all four
>>> characters are DISALLOWED. If we then apply the IDNA2003
>>> mappings and Punycode conversion, we get "xn--lwhimq", which
>>> could be looked up. An example that is graphically even
>>> more interesting would be "□□□" (U+25A1 U+25A1 U+25A1),
>>> which looks suspiciously like the "no available font/graphic"
>>> indicator in many systems. It, too, is DISALLOWED by IDNA2008
>>> but, for IDNA2003, is successfully converted by ToASCII to
>> I think this scenario is where there is where some thought is
>> needed. What will it cost? Is the cost worth the benefit?
>> Consider http://xn--iblogging-0g3f.blogspot.com/
>> ("http://i♥blogging.blogspot.com"), which works right now.
>> My problem is that if this stops working, then: A) Microsoft
>> is going to get a bug because Internet Explore is "broken." B)
>> Firefox, etc. are going to get bugs because they're "broken."
>> C) Several ISPs are going to get bugs because their DNS is
>> "broken." D) Blogspot's going to get a bug because the blog's
>> going to need a new name, and, AFAICT, there's no way to
>> easily change the blog's name.
>> And this is the BEST CASE!
>> Lots of these types of things (blogs, photo sharing accounts,
>> message boards, etc) have dashboards that are like
>> http://i♥blogging.blogspot.com/dashboard.html, in which case
>> the user can't even get to their settings. Many of them make
>> you look at your profile before contacting customer service.
>> (Eg: you have to be logged in to a valid account before
>> they'll spend customer support time on you). In that case you
>> won't even be able to file a bug if your browser won't let you
>> go to the URL because it's illegal punycode2008.
>> This isn't completely hypothetical. I'm not sure about
>> blogspot, but other applications already permit registration
>> of illegal xn-- names, so it's pretty simple to see how badly
>> behaved they get in the event of a hypothetical change.
>> I could concede that maybe symbols are really unnecessary, and
>> useless, overly cute, or whatever, but the customer support
>> tail for this change is going to be inordinately expensive.
>> So I suspect that clients will want to continue to provide
>> IDNA2003 support even if the registrars prohibit new
>> registrations. I also don't think that it is reasonable to
>> expect that IDNA2003 registrations can easily be prohibited at
>> all levels of the DNS.
>> I'm NOT objecting to specific, targeted changes, it's the
>> huge changes that disconcert me. I don't want to pick on each
>> case individually, but for ZWJ/ZWNJ, I could see needing to
>> make a fix, because:
>> * Some languages have issues with code point(s) that just
>> don't work for that language in IDNA2003. For ZWJ/ZWNJ, you
>> can't correctly display strings without them, so they are
>> required for some languages.
>> * For ZWJ/ZWNJ, we're a bit "lucky" because the mappings were
>> dropped. This means that 2008 resolution will go somewhere
>> else, but at least dropping them by hand would get you to the
>> old URL. (so it won't just disappear like i♥blogging would).
>> Just because a few specific changes are needed to mappings to
>> support certain languages doesn't mean that a larger set of
>> changes is a good idea.
>> As always in the IETF, I am representing my own concerns :)
>> Microsoft, Windows Live, IE, etc. haven't decided what kind of
>> fallback will be used for IDNA2008 to IDNA2003. Obviously
>> that'll depend on the mappings the end up being in 2008 and
>> how they break existing users. "I" would like the RFC for
>> IDNA2008 to be clear enough that client apps (whether
>> Microsoft or not) won't feel like they need to add an extra
>> lookup for backwards compatibility. (Or even worse, "fix" the
>> mappings themselves).
>> - Shawn
>> P.S: I'm not trying to pick on Blogspot or Google.
>> "Microsoft" products have similar (or worse) behavior, but I
>> didn't want this to be derail into a "fix your bug" discussion.
More information about the Idna-update