AW: Eszett and IDNAv2 vs IDNA2008

John C Klensin klensin at jck.com
Fri Mar 13 21:01:35 CET 2009



--On Friday, March 13, 2009 17:22 +0000 "Shawn Steele (???)"
<Shawn.Steele at microsoft.com> wrote:

> If Herr Weiss and Herr Weiß get different DNS names then when
> they tell an aquaintance to look them up there's going to be
> confusion.
> 
> I may not mind owning "steele.com" but there are lots of
> steele's, we can't all get our preferred choice.
>...

Shawn,

Let me try again to explain what I think several others have
tried to say as well.   Perhaps a slightly different point of
view will help. 

First of all, as you indirectly point out, if Herr A. Weiss gets
"weiss.de" then, if Herr F. Weiss tells an acquaintance to look
him up using only the DNS, there is going to be confusion.  That
problem is nothing new and has nothing to do with Eszett.

Second, your notes and comments keep assuming that "ss" and "ß"
are simply different display forms of the same thing.  They are
not.  To quote Marcos, George, and others, they are different
things.  If my memory is correct, there are places and dialects
in Germany in which "Weiss" and "Weiß" would even be pronounced
differently.  There is really little difference between the
comment you make above and my saying "well, if Mr. Steele, Mr.
Steel, and Mr. Steal all get get different domain names, there
is going to be confusion".  The statement is true, but no one
has proposed seriously that the DNS should do anything about it.


Because of some history of typographic conventions, there is an
argument for making "ß" and "ss" match.  From a conceptual
standpoint (as compared to some Unicode decisions and issues),
that is not significantly different than arguing that "ö" and
"oe" should match (for German, maybe they should) or that "ö"
and "ø" should match (in the Nordic countries, maybe they
should).  Martin would probably prefer that Dürst and Duerst
match, but they don't and he is very clear that the first is the
correct spelling of his name and the second is a second-best
approximation.

Fully as important is the fact that we can't just match with
IDNA, we actually have to turn one form (the correct one) into
the approximation and lose information in the process.

There are also people out there who are convinced that my last
name should be spelled Klemson or Clemson or Kleinsin.  They are
wrong.  That causes confusion.  The confusion isn't all bad,
e.g., it probably reduces the amount of spam I get. But we have
no expectation that the DNS should make the adjustments for them.

We do have systems that we call directories, search engines, and
other things whose purpose it is to resolve those sorts of
spelling errors by knowing about alternate spelling conventions,
performing "sound alike" tests, and so on.  They aren't the DNS
and your belief that "Weiss" and "Weiß" should match takes you
into their territory.

Your other argument, as I understand it, is that we made this
mistake in IDNA2003 and therefore must continue to make it
forever.  I don't agree.  First of all, a similar argument could
have been made earlier in this decade.  From that point of view,
we made a Host table design decision in the 70s, and then a DNS
one, when identifiers were restricted to ASCII.  That decision
(again from that point of view) was stupid and narrow-minded,
but it was a decision we made and any attempt to change it to
allow for other characters would cause nasty transition
problems.  And they were right: introducing the possibility of
introducing labels that contain "ä" into the DNS immediately
creates a transition problem for registries that want to permit
that character because people might have previously registered
names that are properly spelled with it by substituting "a" or
"ae" instead, and there is no way to tell from the earlier
registrations what they actually intended.

I also have noted earlier that one cannot move IDNs from Unicode
3.2 to Unicode 5.1 --even with the IDNAv2 model-- and maintain
strict backward compatibility.   While the transition issues are
larger with IDNA2008 than they are with IDNAv2, it isn't a
choice between "big transition problems" and "no transition
problems".   If you are going to argue for strict and absolute
backward compatibility, then we are either at "IDNA2003 (and
Unicode 3.2) forever" or we are involved with a third model that
adopts Unicode 5.1 by excludes, on a character-by-character
basis, any Unicode 5.1 character that would introduce new or
transition issues.  I don't imagine that the folks who proposed
adding those characters to Unicode because they needed them
would like that very much, but...  This isn't the first time
that has been explained either.

Those arguments actually were made about incompatibility,
transition problems, and ASCII-only decisions that we had made
and were stuck with.  Had the community agreed, we wouldn't have
IDNs at all.  We concluded instead that we were more concerned
about making the Internet more accessible to future users and
users of languages and scripts that required additional
characters than we were about preserving strict backward
compatibility with things that some people might expect to
match.  I think that logic still holds and that adding Eszett as
a distinct character now is really not significantly different.

I would also suggest that a slight extension of your strict
backward compatibility argument would require your company to
continue supporting all of the interfaces of Windows 3.1
(including its DOS underpinnings), to say nothing of Windows 95,
98, etc.  After all, they are all called Windows and switching
interfaces is disruptive.  No one (I hope) really expects that,
but it does emphasize the fact that moving toward better
functionality, interfaces, and user experience can be more
important than strict backward compatibility.

Others have addressed the argument that we have to stick with
IDNA2003 because of MSIE7.  I don't feel any need to add to
those comments.

    john





More information about the Idna-update mailing list