Stop me if I've misunderstood...

Shawn Steele Shawn.Steele at microsoft.com
Thu Jul 9 23:13:02 CEST 2009


> I don't think IDNA2008, with or without the most recent
> proposals, changes that property.  The main thing IDNA2008 does
> that is different from IDNA2003 is to strongly discourage any
> string that requires mapping from those adverts.

That's not gonna happen.  Burger King isn't going to write "haveityourway.com" on the side of the bus, it's gonna be "HaveItYourWay.com".  Sure, mapping in ASCII is free, but there's a need for mapping in non-ASCII contexts as well.  Specifying or recommending something we know is going to be ignored is bad.  A) it encourages people to interpret the standard how they see fit, and B) developers can't count on the language because they know it'll be ignored.

I'm not saying that the U-label form shouldn't be encouraged in the bowels of the system, that'd clearly be good.  I am saying that anything potentially user facing shouldn't have this recommendation.  Especially if "marketing" is going to have a voice ;-)

> Shawn's discussion of the characteristics of "as little as
> possible" seem right to me.  Certainly I would not recommend
> implementing the portion of a browser into which users type
> characters they think they see on buses without case mapping

The problem's that they won't think about it before blogging either, or putting it in an href :)

> The recipe for chaos lies in having multiple different URIs (and
> parent domain names) that don't compare equal on a string
> basis but that do map to the same domain name

Hmm ;-)  Let's see "microsoft.com", Microsoft.com, MICROSOFT.COM, MicroSoft.Com, MiCrOsOfT.CoM, I'm not gonna list 512 variations of "Microsoft" that don't compare equal in binary form, yet resolve to the same domain name.  The same is true for IDNA2003 strings.  Despite the mapping step, they can be consistently compared correctly.  (and would still, 'cept for the 4 breaking changes in IDNAbis).

The problem isn't having different binary forms that compare the same, the problem is not having consistent mappings for that comparison.  If everyone uses the same comparison rules, like ASCII DNS, then it's just a simple function call to check equality.  If you do it a lot, like in a database, then just store the canonical form.

-- Eszett was also discussed.

I'd like to point out that the current draft approach is a bit hypocritical.  The draft says "Mapping is discouraged: your preferred display name doesn't matter, just use aaa.com instead of AAA.com".  It also says "ss changes: your display name matters enough that we'll break the β <-> ss behavior."

β has nothing to do with being able to look up fuβball.de (which works fine in 2003), only the display form.  In fact it'd be bad for fuβball.de to go somewhere besides the same server as fussball.de.  The DNS form is xn--fuball-xxxx.de, so that doesn't help display of the β.

The problem with β is how do you type it (which works in 2003), and how to display it (which kinda works in 2003).  fussball.de is no worse for the intent/linguistics than haveityourway.com (I'd even argue its better since there's a lot more cases where PetsMart and PetSmart would have different meanings than cases where the β and ss is mildly interesting).

Apparently we haven't even figured out the input side, but if we're concerned enough about display to break 2003 compatibility, then we should think about display.  If we had a way of determining a good display form, then we'd need 0 of the 4 breaking changes from IDNA2003, because they could all be resolved by specifying the display form (which would obviously have to map to the U-label).  That'd solve HaveItYourWay.com and AAA.com as well.

-Shawn



More information about the Idna-update mailing list