Consensus Call on Latin Sharp S and Greek Final Sigma

Shawn Steele Shawn.Steele at microsoft.com
Tue Dec 1 10:42:23 CET 2009


> This I am not worried about. We already have these 
> problems in applications all over the place. Different 
> locale give different results.

Huh?  http://anything always goes to the same place.  We may have different behaviors in word processors or something, but when I click on a link from your email, it'll go the same place as it did when you click on it.  THAT must be inviolate.  

> You argue for exactly the same mapping all over the world, but not what it should be.

Yes

> Could happen. Just like today when I have to enter a decimal
> comma on some webpages, and decimal period on some.

Not just like today.  Just like today, http://someserver.com/query?price=12.25 will go to the same server regardless of your locale, AND, for that web site, you must use ., and not ,.  (Unless it remembers locale settings).  But the domain name part always gets you to the same server.  That server may decide to be smart or not, but at least by that time you know it's not a phishing attack trying to outsmart you.

> I do not, unfortunately, see the ability to completely get rid of all
> problems here. Simply because we talk about _languages_ and 
> _humans_ and that these two requirements can not be implemented
> at the same time:

Exactly.  The IDN rules need to be human-friendly rules that work consistently for the machines.  It's good for the rules to try to be flexible for real language use, but they can't match human languages and behavior perfectly.  I can't even put a space in my business name, nor use camel casing reliably.  They aren't supposed to be perfect rules, just usable rules.

> 1. Users want the matching algorithm they are used to
> 2. Engineers want the same matching algorithm all over the world

> We just can not get that.

Correct, and, unfortunately the users will have to "lose" a bit because it is more important that the same domain name gets to the same place regardless of its context.  This isn't surprising, there're a lot of "Steele's" out there, and we can't all get Steele.com.  We adapt.  

Personally I'd have even more relaxed rules, like mapping all 4 i's to i so that Turkish and US users had consistent experiences.  The cost would be that some names would be "taken" even though linguistically they weren't really the same.  (I'm not sure that could be extended to all cases though, there're a LOT of languages and I only know the painful edge cases).

> If we now get back to the PVALID/DISALLOWED question that
> this was about, I claim you -- part from the transition -- have a 
> HIGHER chance of be able to produce a good mapping table 
> that you want if Eszett and Final Sigma are allowed.

I don't want to make a mapping table :) I want to use one provided by a standards body.  

> Then in the longer run you do not have to map those characters out.
> You are without risk Governments and others that do care about their
> characters tell you to change when there actually (compared to today)
> deployed href's etc.

What if we took two inputs and ran them through a mechanism to determine a hash, then compared the hash for equivalence.  Or used the hash to find an IP address.  Then when we displayed the found server, we used the "correct" form according to the owner of the name(s)/IP.  That makes everyone happy.  You can type in somewhat flexible forms, but they get displayed correctly per the owner.  The hash isn't human readable, so you can't complain that eszett (or ss) isn't available or whatever, it just goes to a server.  

Some labels wouldn't be available.  (You couldn't use Fussball if Fußball was already taken, but it wouldn't really say one spelling was "better" than the other, just that some (perhaps odd) forms went to the same server as the preferred form.)  So when you registered "Fußball", you'd get "Fußball" in your display.  Sure, fussball'd go there too, but CoKe goes to coke, and AAA (locksmithing) can't register their name because the auto club go there first.

The only difference is that our "hash" is close enough to human readable (if you ignore the xn-- hack anyway), that we don't think we need a display form.

- Shawn


More information about the Idna-update mailing list