Consensus Call on Latin Sharp S and Greek Final Sigma
patrik at frobbit.se
Tue Dec 1 05:15:47 CET 2009
On 1 dec 2009, at 01.04, Shawn Steele wrote:
> In Switzerland users would certainly expect eszett to map to ss. In Germany and Austria they wouldn’t be too surprised by the behavior.
But unfortunately for example in Sweden it would be really bad to have that mapping. Simply because the context is to a large degree depending on what the user using the user interface is used to. Like any kind of locale definitions.
I claim we get more confusion if the mapping that happens is different _for the same user_ than what the user is used to, than for example to have the same mapping for two different users.
This might sound weird, but, we already have that problems with other characters like accented characters. What is to be the same and what is not? And as some people pointed out, the color / colour issues and similar that already exists.
I think the current architecture where we have a core protocol that does not include mapping, but instead very very stable grounds for what characters might "float around" (the PVALID ones), and then on top of that a mapping architecture that maps various characters (or combinations thereof) to the set of PVALID characters is correct.
There MIGHT be multiple ways of doing mappings. Maybe the same mappings all over the place. Maybe the same mappings in the same application. Maybe the same mappings in the same context across applications. I do not know. But of course there is a need for mapping to help the user. Just like there are mappings already from what a user is typing on a keyboard to what is sent to the application. (I have a key for 'ä' on my keyboard while others might have to press '¨' and then 'a'.)
The mapping specifications, the real ones, can be developed in whatever SDO that is out there. W3C? Unicode Consortium? What I think I am more and more certain on is that IETF is not the correct venue.
Now, the problem I think is *NOT* the mappings, but as you say Shawn, how to *specifically* handle Sharp S and final sigma.
We have two alternatives for the core protocol:
1. Have it as PVALID
2. Have it as DISALLOWED
Both have negative consequences:
1. We have a transition problem, specifically while having both IDNA2003 and IDNA2008 out in the wild in parallell, including various sunrise periods with registries that have allowed registration of sharp s (i.e. their equivalent 'ss')
2. We require some mapping, including definition of when this mapping is to happen, and when it is not to happen
My view, as you all have seen, is that (1) is better. Simply because IF the sharp s is PVALID, then it is possible for the domain holder to choose whether ß and ss should refer to the same website or not. I can implement that as a holder of the domain name with CNAME/DNAME on DNS level, but also with redirect in HTTP (and similar). I can also if I find it being really important for me actually use the sharp s for _everything_.
In case (2) the user do not have any choice. ß and ss is to be treated the same. Even in language contexts like Swedish where it is not (ß is just weird, but it is definitely not the same as ss). People can then type buß.se and reach buss.se, something that would not really be what is expected.
In both these cases the end user can let their browser (or whatever) do whatever the user find is the easiest for them.
So for me this is a question of choice the domain holder has. Can they choose to differentiate between ß and ss or not? Can they actually use ß for real or not?
And for me, both of those questions should be answered by "yes", and yes, I think a transition strategy from IDNA2003 to IDNA2008 is worth it. We should not destroy and make it impossible to use ß in domain names just because of requirement for backward compatibility of a very very low number of href's that is out there today (data from Erik).
There *is* an ability to handle this backward compatibility issue. By doing double lookups together with registry policy for example. People will learn. In the longer run it is better to have ß as a separate character.
More information about the Idna-update