AW: Eszett and IDNAv2 vs IDNA2008
g.ochsner at revolistic.com
Fri Mar 13 09:40:45 CET 2009
> - "ß" and "ss" are linguistically two different things.
> true: and so are "Polish" and "polish", or "therapist" and "the rapist"
> - neither of these differences can be directly represented in domain
That kind of comparison is not getting truer by repeating it. ß is not simply the lowercase of SS or vice verse. ß used to have no uppercase (in Unicode), now IT HAS. Regarding your second example do you mean that therapist.com should be bundled with the-rapist.com?? Or as another idea should wwwapple.com be bundled with apple.com, because it is a very common typing error?
> - Many people do now think that the mapping in IDNA2003 was a (big)
> mistake, which can be corrected now.
> It may or may not have been a mistake. Having the uppercase of a string
> map to a different place than the original is a bad thing, in many
> peoples' minds. I think the real problem is not that "ß" and "ss" have
> the same canonical form, it is that the preferred display form for a
> given string is not maintained by the Punycode encoding.
> And the "can" is at issue. It certainly can be done, but the cost is not
> insignificant. At least some people are very worried about the
> compatibility and security issues.
Sure the WG must address the compatibility and security issues, but making ß PVALID is a big gain. People in the future will be able to freely choose which domains they use, just like they can decide to use ß or ss when they are writing. I think people are intelligent enough to deal with ß in domains if they deal with it in everyday's life. That's nothing the protocol must dictate. People nowadays can also deal with similar issues e.g. www.whitehouse.org leads to Mr. Bush while www.white-house.org leads to advertisements. Choose yourself which one you like better.
> - There is consensus to make ß PVALID in IDNA2008.
> I don't think the situation is that clear-cut: see Marcos's mail.
There has been a consensus call with a clear outcome.
> - In the future domains with ß and ss should be autarchic domains in the
> Not sure what you mean by "autarchic". Do you mean "separate"?
Yes, I mean separate by protocol. The registries can solve the rest (sunrise periods, cloning registrant and NS data etc.) And they have many native speakers and know best about their local situation.
> - As a registrant it can, but not necessarily must be interesting to
> have two domains, that just vary in ß and ss. (e.g. buße.de means
> penance.de where busse.de means busses.de in English - two completely
> different meanings)
> Yet somehow the Swiss manage to understand busse with both meanings, and
> all Germans manage with BUSSE having both meanings. When I've asked for
Yes, but "somehow" doesn't mean things can't be made better.
> examples, the number of cases where there are two distinct meanings
> appears to be extremely small; any ambiguity introduced is orders of
> magnitude smaller than ambiguities introduced by omitting spaces between
> words, for example.
> If you want to give some data as to the percentage of German words that
> are distinguished in meaning by ß and ss -- and of course omitting those
> affected by the latest spelling reform, which caused the preferred
> display form to shift from one to the other.
Let me give you thousands of relevant examples at once. I am sure you agree that surnames are often used as (parts of) domain names e.g. smith-books.com . Now I queried the German telephone book and this is the result. 1,5 Mio Germans have a surname with ß. There are over 3'100 different pairs of surnames (2 x 3'100 names) which do only differ in ß and ss instead. Over 2,5 Mio people (!) have one of these surnames either with ß or ss. (e.g. Abeßer/Abesser, Ablaß/Ablass, Abstoß/Abstoss ...) I think it would be the right thing, if Mr. Weiß could register weiß.de while Mr. Weiss has weiss.de. For a user looking for the website it is just like looking up his number in a phonebook, he has to know if he is looking for Mr. Weiß or Mr. Weiss.
More information about the Idna-update