Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

Martin Duerst duerst at it.aoyama.ac.jp
Wed Jan 23 07:44:10 CET 2008


At 13:33 08/01/22, Vint Cerf wrote:
>please keep in mind that the purpose for Domain Names is identifiers,  
>not necessarily orthographically correct natural language. It is  
>precisely the potential confusion between these two uses of scripts  
>that leads to potentially hazardous choices of permitted characters  
>in domain names.

Of course very much so, but let's just take an example from English.
People from other languages/scripts could for example observe that
sometimes, it's not clear whether to use an 's' or a 'z'.
As an example, analyze is US spelling, but analyse is British.
Now let's assume these people decide that this problem is
best solved by ruling out the 'z' and converting all 'z' to 's'.
In the DNS, it would be "sebra" instead of "zebra", and so on.
(English spelling is particularly irregular, so many similar
examples are equally possible).

Sure everybody in the US should be able to live with that,
because we are dealing with identifiers, not orthographically
correct language? And this restriction actually removes some
potentially hazardous alternatives, or doesn't it?

The above example is hypothetical, but I hope it helps illustrate
how people using other scripts might feel. I very much think
that we have to distinguish between identifiers and natural
language,

I think the reason why this is excluded in IDNA2003
(if it indeed is) is that it turns into a plain sigma
when upper-cased and then lower-cased again. That's just
a consequence of the rather roundabout way that the Unicode
casing table was used in IDNA2003.

Regards,   Martin.


>On Jan 21, 2008, at 9:47 PM, Martin Duerst wrote:
>
>> I'm sure this has already been discussed, probably in several
>> places, but thinking from a simple user perspective, why should
>> final small sigma be disallowed? After all, writing a word ending
>> in sigma with a non-final sigma would look really strange, or
>> wouldn't it? And likewise writing a word containing a singma in
>> the middle with a final sigma would look really strange, or
>> wouldn't it? So in my view, it would be better to address this
>> e.g. at the registry level rather than to produce bad typography.
>>
>> Regards,   Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list