Mappings - some examples

Alexander Mayrhofer alexander.mayrhofer at nic.at
Mon Nov 30 15:57:15 CET 2009


> > You know very well that we don't know the level of 
> contamination, because it's all hidden by the client right now.
> 
> Yup, but, that was also why I wanted to know whether you had 
> new data as you wrote "heavily contaminated" and not only 
> "contaminated".

Unfortunately i haven't - because it's all "hidden". Let's say that client side is definitely more than heavily contaminated ;)
 
> > The only numbers i could gather was Erik van Poel's 
> statistics from Google's inventory of crawled host names - he 
> came up with 0.00001% of the domain names containing an "ß" 
> (compared to 0.00122% for "ü").
> > 
> > That could mean two things:
> > 
> > - Either there's really not just that much 
> "web-contamination" (but how much is there elsewhere?)
> 
> My guess (he he he, we continue to guess here) is that the 
> "web contamination" is higher than for example "email contamination".

Yep, it's unfortunately an area of guessing and wishful thinking :-/ 
I *could* imagine that the web contamination is so low because if you enter an "ß" into a browser, it automatically switches over to "ss", so people get "trained" to not use it.
 
> I do now (compared to a year ago) see web-URLs live in Sweden 
> (on TV ads for example) that do contain the Swedish 
> characters, but so far not one single email address with 
> Swedish characters in the domain part.

[slightly OT] 

Yep, same here. We did a reasearch project among our 900k domain names, and (smtp)-talked to each and every MX. Not a single MX host supports UTF8SMTP :-/ ... Definitely space for 15 minutes of geekery fame ;)

It's hard to explain to customers why "Umlauts" work in the domain part of the email address, but not in the user part... mueller at müller.at just looks silly, and people are "conditioned" that it doesn't work..

> > - Or nobody is interested in using it anyway (because even 
> though it works right now, nobody is doing it...)
> 
> I have a third alternative:
> 
> - People do understand ß is mapped to "ss" so they use "ss" 
> in the published URLs.

Ah, ok, just had sort of the same thought above.

> This last is my "hope", and by making "ß" PVALID, it would be 
> possible for those parties to use the character in the domain names.
> 
> The real hidden question is how many people have "ss" in 
> their URLs while they use ß on their keyboards? No searches 
> in Google or elsewhere can say how much that is in use. And 
> this is, if I understand things correctly, both what Mark is 
> worried about, and what me and Harald see as a good thing. 

I've tried to get figures from browser vendors... For example, Google Chrome *does* send the "ß", rather then the "ss" to Google when it does the scary "typeahead" query in the address bar... (from tcpdump):

GET /complete/search?client=chrome&hl=en-US&q=http%3A%2F%2Ffa%C3%9F

(but it also suggests a domain name with "ss" right below..

So, Google *should* know how many Chrome users type a "ß" into the address bar...

> That if ß is PVALID, then the domain name holder can decide 
> (modulo the registry policy) whether a typed ß should result 
> in a successful lookup of "ss".

Well, it would be a successful lookup of "ß" on IDNA2008 clients, and a successful lookup of "ss" on IDNA2003 clients, potentially yielding different results :)

> >> So if 'ä' would have been mapped in IDNA2003, and I now would 
> >> have been asked if I thought 'ä' should be introduced, I 
> >> would say "go for it, but speed up!!!".
> > 
> > Again, I think it's a *very* significant difference whether 
> you open up a new part of a namespace, or you re-define the 
> properties of existing namespace - re-definition is risky. 
> Particularly if such a re-definition is combined with the 
> effects of potentially incompatible mappings...
> 
> Agreed, but I think where we disagree here is "how much" this 
> is "opening up a namespace" and how much is a "redefinition". 
> I see it as opening the namespace (same as when we started to 
> allow 'ä' in .SE, while previously people have registered 
> domain names with just 'a') while you see it as a redefinition.

On the lookup level, it's opening up a new namespace, agreed. On the client side, it is re-definition.

> I would though be more "on your side" if the number of domain 
> names that contained ß where say 100 times higher than today 
> in published documents. Because then people would be TOLD to 
> type in something (ß) that mapped to something else (ss) that 
> was registered. That, I claim, is not the case. At least not 
> "heavily".

I understand that. And i'm saying that the potential of around 500 useful "ß" registrations (based on looking through our inventory of 900k domains) is by far not worth the effort.

Alex


More information about the Idna-update mailing list