Making progress on the mapping question

John C Klensin klensin at jck.com
Mon Mar 30 22:47:10 CEST 2009


Shawn,

While the consensus has certainly been rough and a few people
have regularly expressed doubts as to whether it is worth the
effort, my sense is that we agreed long ago on an
inclusion-based approach based on extending the
letter-digit-hyphen model rather than including arbitrary
symbols and punctuation because they are there. Insofar as the
TLD registries are concerned, this is one of the areas in which
many of them are explicitly aware that those who have registered
these strings are going to have to put in some transition effort
and probably take some heat.   It is clearly up to Vint but,
speaking for myself only, I don't know how many more times we
should need to go through that discussion.

To review some of the general concern about symbols, consider
your example in context with
http://xn--iblogging-vf3f.blogspot.com/
(i♡blogging.blogspot.com) and
http://xn--iblogging-vj5f.blogspot.com/
(i❤blogging.blogspot.com).    I don't know how the typical
user remembers which of these to use or distinguishes among
them, nor how registration lookup databases ("whois" etc.) refer
to them, etc.  It might be possible to sort some of those issues
out, but it would require considerable effort and probably a
good deal of user education.

I don't underestimate the support call issue if these more or
less suddenly stop working, but I suggest we have been through
it before.  Users who shifted from IE6+plugin to IE7 saw some
IDNs disappear into ACE form because of the languages they had
enabled.  Users who were successfully using early Firefox
implementations of IDNA saw native-character strings disappear
with upgrades when those domain names were rooted in TLDs whom
Firefox didn't think took adequate precautions to avoid phishing
and confusing strings.  There has never been a recommendation to
use symbols in domain name labels: the IESG note, the various
versions of ICANN guidelines, and other sources of advice have
all focused on the characters that are used to write words in
languages, and these symbols are not used that way.   

It seems to me that you will need an FAQ or knowledge base
article that says, in essence:

	Q: My domain name which contains hearts, spades, or
	other symbols, stopped working in the most recent update
	of the browser.  Why?
	
	A: Use of these characters in domain labels was never
	recommended, although some systems permitted them to be
	registered and used.  A new standard has been adopted
	that explicitly restricts identifiers to the letters and
	digits used to write the languages of the world and we
	have changed the implementation to conform to that
	standard.

Note, too, that this is not really a mapping issue.  U+2665 is
not changed by Stringprep.  I just got the details wrong when I
sketched out that appendix in Protocol as "if IDNA2008 doesn't
produce a resolvable name, use IDNA2003".  

    john


--On Monday, March 30, 2009 19:35 +0000 "Shawn Steele (???)"
<Shawn.Steele at microsoft.com> wrote:

> -----Original Message-----
> From: idna-update-bounces at alvestrand.no
> [mailto:idna-update-bounces at alvestrand.no] On Behalf Of John C
> Klensin Sent: Pōʻakahi, Malaki 30, 2009 8:20 AM
> To: Vint Cerf; idna-update at alvestrand.no
> Subject: Re: Making progress on the mapping question
> 
>> (c) The above would imply that we apply _all_ IDNA2003
>> mappings and lookups if the IDNA2008 lookup of (1) fails.  I
>> do not believe that is actually our intent.   Consider the
>> string "┌┐└┘" (U+250C U+2510 U+2514 U+2518).  An
>> IDNA2008 conversion fails completely because all four
>> characters are DISALLOWED.  If we then apply the IDNA2003
>> mappings and Punycode conversion, we get "xn--lwhimq", which
>> could be looked up.    An example that is graphically even
>> more interesting would be "□□□" (U+25A1 U+25A1 U+25A1),
>> which looks suspiciously like the "no available font/graphic"
>> indicator in many systems. It, too, is DISALLOWED by IDNA2008
>> but, for IDNA2003, is successfully converted by ToASCII to
>> "xn--u0haa".
> 
> I think this scenario is where there is where some thought is
> needed.  What will it cost?  Is the cost worth the benefit?
> 
> Consider http://xn--iblogging-0g3f.blogspot.com/
> ("http://i♥blogging.blogspot.com"), which works right now.
> My problem is that if this stops working, then: A) Microsoft
> is going to get a bug because Internet Explore is "broken." B)
> Firefox, etc. are going to get bugs because they're "broken."
> C) Several ISPs are going to get bugs because their DNS is
> "broken." D) Blogspot's going to get a bug because the blog's
> going to need a new name, and, AFAICT, there's no way to
> easily change the blog's name.
> 
> And this is the BEST CASE!
> 
> Lots of these types of things (blogs, photo sharing accounts,
> message boards, etc) have dashboards that are like
> http://i♥blogging.blogspot.com/dashboard.html, in which case
> the user can't even get to their settings.  Many of them make
> you look at your profile before contacting customer service.
> (Eg: you have to be logged in to a valid account before
> they'll spend customer support time on you).  In that case you
> won't even be able to file a bug if your browser won't let you
> go to the URL because it's illegal punycode2008.
> 
> This isn't completely hypothetical.  I'm not sure about
> blogspot, but other applications already permit registration
> of illegal xn-- names, so it's pretty simple to see how badly
> behaved they get in the event of a hypothetical change.
> 
> I could concede that maybe symbols are really unnecessary, and
> useless, overly cute, or whatever, but the customer support
> tail for this change is going to be inordinately expensive.
> So I suspect that clients will want to continue to provide
> IDNA2003 support even if the registrars prohibit new
> registrations.  I also don't think that it is reasonable to
> expect that IDNA2003 registrations can easily be prohibited at
> all levels of the DNS.
> 
> I'm NOT objecting to specific, targeted changes, it's the
> huge changes that disconcert me.  I don't want to pick on each
> case individually, but for ZWJ/ZWNJ, I could see needing to
> make a fix, because:
> 
> * Some languages have issues with code point(s) that just
> don't work for that language in IDNA2003.  For ZWJ/ZWNJ, you
> can't correctly display strings without them, so they are
> required for some languages.
> 
> * For ZWJ/ZWNJ, we're a bit "lucky" because the mappings were
> dropped.  This means that 2008 resolution will go somewhere
> else, but at least dropping them by hand would get you to the
> old URL.  (so it won't just disappear like i♥blogging would).
> 
> Just because a few specific changes are needed to mappings to
> support certain languages doesn't mean that a larger set of
> changes is a good idea.
> 
> As always in the IETF, I am representing my own concerns :)
> Microsoft, Windows Live, IE, etc. haven't decided what kind of
> fallback will be used for IDNA2008 to IDNA2003.  Obviously
> that'll depend on the mappings the end up being in 2008 and
> how they break existing users.  "I" would like the RFC for
> IDNA2008 to be clear enough that client apps (whether
> Microsoft or not) won't feel like they need to add an extra
> lookup for backwards compatibility.  (Or even worse, "fix" the
> mappings themselves).
> 
> - Shawn
> 
> P.S:  I'm not trying to pick on Blogspot or Google.
> "Microsoft" products have similar (or worse) behavior, but I
> didn't want this to be derail into a "fix your bug" discussion.
> 
> 
> 






More information about the Idna-update mailing list