AW: Names

Fri Mar 20 08:24:27 CET 2009

Hello Mark,

> -----Ursprüngliche Nachricht-----
> Von: idna-update-bounces at alvestrand.no [mailto:idna-update-bounces at alvestrand.no] Im Auftrag von Mark Davis
> Gesendet: Donnerstag, 19. März 2009 21:31

> > Let me give you thousands of relevant examples at once. I am sure you
> agree that surnames are often used as (parts of) domain names e.g.
> smith-books.com . Now I queried the German telephone book and this is
> the result. 1,5 Mio Germans have a surname with ß. There are over 3'100
> different pairs of surnames (2 x 3'100 names) which do only differ in ß
> and ss instead. Over 2,5 Mio people (!) have one of these surnames
> either with ß or ss. (e.g. Abeßer/Abesser, Ablaß/Ablass, Abstoß/Abstoss
> ...) I think it would be the right thing, if Mr. Weiß could register
> weiß.de while Mr. Weiss has weiss.de. For a user looking for the website
> it is just like looking up his number in a phonebook, he has to know if
> he is looking for Mr. Weiß or Mr. Weiss.
> 
> Your numbers are a bit unclear to me. You are saying that 0.4% of German
> names in the phonebook are distinguished only by ß vs ss. You then say
> that 2.5M people have those names, which would be 3% of the German
> population. So you are saying that those people are proportionately
> overrepresented in the population by 700%? Or do you mean that 2.5M
> people have names containing either ß or ss? (Could you point to your
> data sources also?)

E.g. "Weiß" and "Weiss" is 1 pair of surnames, that only differs in ß and ss. There are over 3'100 such pairs. 
There are 31.948 entries of "Weiß" in the data source and 8.961 entries of "Weiss". All together there are 40.909 entries with one of those two surnames.

The data source consists of the private entries in the German phonebook (2006) as I got it from http://christoph.stoepel.net/geogen/v3/Software.aspx 
It contains 29.4 Mio entries, thus I extrapolated the outcomes to match the German population of 82.3 Mio. I assumed that there is no big difference concerning the surnames between people who have an entry in the phonebook and those who have not.

I am not sure what the 0.4% is that you mentioned but I hope the example and the description above show how the numbers in total were calculated. If not, just let me know.

> If we didn't have the compatibility problems, separating eszett from ss
> would not be an issue. There would still be the issue of casing, but
> that would be trumped by the reasonable desire to distinguish them. It
> is the compatibility issue that most concerns me.

I am glad to hear that. :)

Best regards
Georg