Consensus Call on Latin Sharp S and Greek Final Sigma

Patrik Fältström patrik at frobbit.se
Tue Dec 1 09:57:38 CET 2009


On 1 dec 2009, at 09.19, Shawn Steele wrote:

> Which is even worse.  To take eszett, a swiss context may produce different results than a germany context.

This I am not worried about. We already have these problems in applications all over the place. Different locale give different results.

> So a Germany user ends up at a different web site than a swiss user.

This only if the resulting lookup in the DNS are for different domain names. Which I think is exactly what you are saying.

What I do not understand though is what this has to do with whether the discussed codepoints are PVALID or DISSALOWED in the long run.

You argue for exactly the same mapping all over the world, but not what it should be.

> Or, worse, a Germany user visiting a Swiss airport kiosk ends up someplace unexpected.

Could happen. Just like today when I have to enter a decimal comma on some webpages, and decimal period on some. All depending on not even my locale, but the locale the cgi is operating in.

But I do understand your point.

> No matter what else, inconsistent mappings MUST be verbotten.  I cannot think of any practical mechanism to allow different registrars to specify their contexts in a consistent mechanism.  (They'd have to dump mapping tables along with the zone files or something).

They already today have bundles and other things as described in the IANA language tables, to minimize confusion. What then happens if the "wrong" alternative is looked up is that you get some redirect, same response, or no response. Minimize the risk you get the wrong response.

Of course, that is only on registry level (TLD). Anyone can have whatever they want in their zone.

> (Yes, I realize eszett mappings could be forbidden if it's PVALID, but that's not the point, this example exists for many cases where mapping is interesting, such as dot/dotless capital letter I, etc.)

Or accented capital latin characters, that are to be treated differently depending on locale.

I do not, unfortunately, see the ability to completely get rid of all problems here. Simply because we talk about _languages_ and _humans_ and that these two requirements can not be implemented at the same time:

1. Users want the matching algorithm they are used to

2. Engineers want the same matching algorithm all over the world

We just can not get that.

Browsers already today do play a game with TLDs, add ".COM" for example. That is not fun, and gives tons of bad results in many cases. And operating systems use search domains in their resolver configuration. So the time when people really ended up at the site they "typed in" is long gone.

> I cannot update IdnToAscii, IdnToUnicode, IdnToNameprepUnicode (Windows), and the IdnMapping class (.Net), without a well defined and consistent mapping table.  Other implementors seem to share that view regardless of other disagreement about individual characters.  Without well defined, consistent mappings, I'd have to ignore IDNA2008 because I can't have random behavior from system APIs.

If we now get back to the PVALID/DISALLOWED question that this was about, I claim you -- part from the transition -- have a HIGHER chance of be able to produce a good mapping table that you want if Eszett and Final Sigma are allowed. Then in the longer run you do not have to map those characters out. You are without risk Governments and others that do care about their characters tell you to change when there actually (compared to today) deployed href's etc.

The real problem is the transition. How do we get rid of the mapping from IDNA2003? First of all I would not be so nervous as the use of ß is so low. Secondly, I see many ways forward, where the most important one has to do with information, sunrise in the registry etc.

   Patrik



More information about the Idna-update mailing list