Stop me if I've misunderstood...

John C Klensin klensin at jck.com
Fri Jul 10 17:57:32 CEST 2009


--On Thursday, July 09, 2009 21:13 +0000 Shawn Steele
<Shawn.Steele at microsoft.com> wrote:

> 
>> I don't think IDNA2008, with or without the most recent
>> proposals, changes that property.  The main thing IDNA2008
>> does that is different from IDNA2003 is to strongly
>> discourage any string that requires mapping from those
>> adverts.
> 
> That's not gonna happen.  Burger King isn't going to write
> "haveityourway.com" on the side of the bus, it's gonna be
> "HaveItYourWay.com".  Sure, mapping in ASCII is free, but
> there's a need for mapping in non-ASCII contexts as well.
> Specifying or recommending something we know is going to be
> ignored is bad.  A) it encourages people to interpret the
> standard how they see fit, and B) developers can't count on
> the language because they know it'll be ignored.

And you are reasoning from analogies that may not hold up.
Before I try to explain that (following up part of Elizabeth's
note), I want to stress that...

First of all, our role here is to make things work well and
predictably, with catering to the inclinations of various
marketing and branding departments (Burger King or otherwise), a
secondary goal at best.  For whatever it is worth, we get more
predictability when we have fewer variations in what is
possible.   Probably we all believe the latter, the question is
how it should properly interact with the user experience.  That
is not an easy question and I don't think that hyperbole (from
either side) or games about who has to prove what moves us
forward.

Second, my experience with marketing people is that, while they
would like a perfect world in which every campaign was
successful, competitors were stupid and ineffective, and will
make all sorts of demands in the hope of realizing one or both,
they are ultimately very pragmatic.    If one is faced with a
choice between "haveityourway.com" or "have-it-your-way.com",
either of which work 100% of the time, and "HaveItYourWay.com"
that works only fairly often, I know that they --or at least the
subset who expect to survive in the business-- will pick one of
the first two.   I note that, as far as the DNS is concerned
"Have It Your Way.com" is a perfectly valid domain name.  

While ICANN rules prohibit names with embedded blanks at the
second level, just as they prohibit raw, non-ASCII, UTF-8, I
assume that I'm not the only one here who has had to listen to
some marketing type complain that "Our Favorite Slogan" could
not be used as a domain name and that "we" had to be smart
enough to make it happen and just weren't trying hard enough.
The response is to explain that

    Our Favorite Slogan.MyCompany.com

is actually a valid domain name that they are welcome to use if
they like, they just wouldn't find that it was very useful in
practice.   Each time I've had that conversation --and there
have been several times-- there has been much complaining but,
eventually, there has been no insistence on domain names with
embedded spaces.

Now, coming back to your example, we have to realize how
culturally- and historically-sensitive this is.  A decision was
made in the early 1970s that names of hosts and networks were
going to be treated case-insensitively.  At the time, that
decision had very little to do with user experiences: we had
hosts that really couldn't handle lower case, hosts that could
but treated the two cases as globally equivalent, and hosts that
were case-sensitive but on which upper case was considered a
little strange.   Case-insensitive identifiers seemed to be the
way to go.  A decade later, that decision was carried forward
into the DNS world without, if I recall, a lot of thought or
discussion, largely because, by then, it had been embedded into
a number of application protocols.  Had the original decision
been made differently -- either to treat identifiers as
case-sensitive or to prohibit one case or the other-- we
probably would be having a different discussion today (not
necessarily an easier one, but different).

Second, the way in which one gets the equivalent of
"HaveItYourWay" in German is traditionally to make a new word,
"haveityourway", with no capital letters in the middle.  If one
wants to maintain distinct word-components, one uses spaces or
maybe hyphens.   There are, in principle, two ways to do it in
Arabic -- the use of initial-form and final-form characters to
denote boundaries or the use of ZWNJ.  But we've been told by
Unicode experts that initial, final, isolated, and medial forms
should all match and the Arabic language community has been
reasonably clear that they do not want or need ZWNJ for writing
the Arabic language.

So I wouldn't generalize much from "Have It Your Way" (with or
without spaces).

> I'm not saying that the U-label form shouldn't be encouraged
> in the bowels of the system, that'd clearly be good.  I am
> saying that anything potentially user facing shouldn't have
> this recommendation.  Especially if "marketing" is going to
> have a voice ;-)

I think maybe we agree, but I'm not sure which "this
recommendation" you are referring to partially because, as you
and others have pointed out, "user facing" is not itself
unambiguous.

Because of the greater distinguishability of lower case
characters and because having reverse-mapping work out, I would
tend to recommend that those who are more worried about
precision and avoidance of attacks based on recognition of
characters stick with U-labels and hence with lower case.  I
would not require that in UIs, but I would probably recommend it
to both advertisers and users.   

Where the design questions get controversial, and despite many
concerns, I'd encourage people who are designing highly
localized UIs to consider forgoing case mapping (and to present
lower case) where the community involved was extra-vunerable to
confusion in scripts with which they were not familiar enough to
easily do the case conversions without looking (e.g., "Q" and
"q" may look alike to you or me, but, to someone with very low
familiarly with Latin scripts and fonts, "Q" might look a lot
more like "o" than it does like "q").    That is a tradeoff with
the principle that anyone who types a given string should get
the same interpretation as anyone else who types that string,
but it may be worth pointing out that, if familiarity with Latin
characters is low enough for my suggestion to apply, there
probably are no Latin characters on the keyboard, so the same
string is _not_ being typed as would be typed by someone with a
Latin-based keyboard.    I don't think we should be trying to
make the decisions involved in this, or in forcing one
particular UI behavior, in the protocol -- partially because I'm
convinced that, after a few bad experiences, we will find UI
software ignoring any rules we write in favor of protecting
users (either by reducing the amount of mapping that is done or
by insisting on user entry of A-labels for labels in unfamiliar
scripts.

To turn that same comment around, I'd think that the designers
of any localized UI that is expected to be used in locales with
Latin-based scripts, or scripts that have variant-width
characters in Unicode, would be nuts not to make the obvious
mappings.  Clearly the spec permits that.   

If you can suggest a better way to make this clear, I'm
listening and I assume that Pete and Paul are too.

    john



More information about the Idna-update mailing list