AW: Eszett and IDNAv2 vs IDNA2008

John C Klensin klensin at jck.com
Fri Mar 13 22:31:56 CET 2009



--On Friday, March 13, 2009 20:25 +0000 "Shawn Steele (???)"
<Shawn.Steele at microsoft.com> wrote:

>> Second, your notes and comments keep assuming that "ss" and
>> "ß" are simply different display forms of the same thing.
>> They are not.
> 
> My German is really bad, but I think I understand the
> difference.  Despite the difference, ss is sometimes used
> instead of ß, even in Germany where it is linguistically
> incorrect.  I believe there should be little difficulty
> finding both examples on street signs within Germany.

Just as you will find both "ö" and "oe" on street signs, except
on the popular street name Goethestraße where Göthestraße
would clearly be wrong.

> I certainly agree that if my name were spelled "Weiß", then
> I'd be pretty picky about not using "Weiss".  However if it
> were my business name, I certainly would want both Weiss and
> Weiß to go to my web server.  Same way I'd want bücher.de
> and bucher.de to go to the same place. 

And you might want "steele.redmond.wa.us" and
"steal.redmond.wa.us" to go to the same place, or maybe you
wouldn't.  I might want "color.info" and "colour.info" to go to
the same place.  Again, there is nothing about this that is
specific to either Eszett or German.  A registry might decide to
make this its problem ... or it might not.

> I think you'll find
> that many string comparisons would also return them as equal,
> particularly in German locales, even on differing OS's.

As I've tried to say several times, I have an entirely different
set of assumptions about what one might do on matching if
matching didn't imply information-losing character mapping.  On
the other hand, I think that such comparisons would probably be
widely (although not universally) considered wrong, simply
because there are a significant number of cases in which strings
with the "ss" substituted for Eszett are actually different
words.  Georg has given several examples.

> I am certainly not going to argue that words SHOULD be spelled
> with ss instead of ß.  Obviously that is wrong, or at least
> clutzy.  I do think that it is a subset of a bigger display
> problem.  None of the information about the mappings in
> IDNA2003 is retained.

And that is one of the reasons for removing mappings from the
protocol... taking information about differences among
characters, pretending (giving the user the impression) that the
distinctions are real and being maintained, and then discarding
that information and not being able to recover it is bad news...
and not just because of the display issues.

> How come nobody has insisted that fussball.de and fußball.de
> should be discrete names?  I'd be happier about ß being
> unique if there was a convincing argument for that.

Please read Georg's recent notes, where he has made exactly that
argument.  If there were no desire to register them
independently, I would be much more sympathetic to the position
you are expressing -- in my words, not yours, it would turn this
discussion and the change into a lot of work and effort for
nothing.

>  If
> fussball should be treated the same, whether or not it is
> proper spelling, then making ß unique causes the same problem
> as the Greek Tonos does for Greek (and for that matter any of
> the umlaut characters in German).

But those "umlaut characters" are actually excellent examples of
why we should stay out of this business, and away from these
slightly-fuzzy mapping at the protocol level.  Note, for example
that
  "fältström" might plausibly be considered to match
"faeltstroem" if it were a German name, but "faltstrom" might or
might not be acceptable.  But it is not a German name, but a
Swedish one, and, while I'll let Patrik tell you how he feels
about Faltstrom, I know that matching it to "faeltstroem" would
be bad news and mapping it into "faeltstroem" would be a
complete non-starter. 

And that takes us into the realm of language-sensitive matching.
Even if one wants to go there at all, I think most of us
understand that it is well beyond anything that can be done with
IDNA or the DNS as we know it.  One of the things that
facilitates this long discussion about Eszett is that it isn't
used in any other Germanic (or even Latin-based) writing system,
much less being used in a different way.
 
> For the record, I'm not arguing back-compat because I'm too
> lazy to fix IE7/IE8.  Were there no IDNA2003 I'd probably make
> the same arguments about ß.  Sure, it is not the same as ss,
> but it is related.

I appreciate that clarification.

best,
    john



More information about the Idna-update mailing list