Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

Vint Cerf vint at google.com
Tue Jan 22 05:33:26 CET 2008


please keep in mind that the purpose for Domain Names is identifiers,  
not necessarily orthographically correct natural language. It is  
precisely the potential confusion between these two uses of scripts  
that leads to potentially hazardous choices of permitted characters  
in domain names.

vint


On Jan 21, 2008, at 9:47 PM, Martin Duerst wrote:

> I'm sure this has already been discussed, probably in several
> places, but thinking from a simple user perspective, why should
> final small sigma be disallowed? After all, writing a word ending
> in sigma with a non-final sigma would look really strange, or
> wouldn't it? And likewise writing a word containing a singma in
> the middle with a final sigma would look really strange, or
> wouldn't it? So in my view, it would be better to address this
> e.g. at the registry level rather than to produce bad typography.
>
> Regards,   Martin.
>
> At 09:24 08/01/22, Kenneth Whistler wrote:
>> Harald wondered:
>>
>>> I do wonder what your mapping tables look like for the trailing  
>>> Greek
>>> sigma - that's the canonical case of a context dependent case- 
>>> mapping,
>>> just as the dotless I is the canonical case of a language dependent
>>> case-mapping.
>>
>> There's no particular need to wonder -- the answers are
>> right there in the data tables. CaseFolding.txt:
>>
>> 03A3; C; 03C3; # GREEK CAPITAL LETTER SIGMA
>> 03C2; C; 03C3; # GREEK SMALL LETTER FINAL SIGMA
>>
>> In other words, U+03A3 and U+03C2 both case fold to
>> U+03C3 GREEK SMALL LETTER SIGMA.
>>
>> And this accounts for why, in the derivation that I posted about
>> a couple of weeks ago, U+03C3 is in the IDN_Always.txt
>> table, but U+03A3 and U+03C2 are not, but are in IDN_Never.txt
>> instead.
>>
>> draft-faltstrom-idnabis-tables-03.txt has not yet
>> fully taken case folding stability into account, IMO,
>> so it has:
>>
>> 03A3 NEVER
>>
>> but
>>
>> 03C2 ALWAYS
>> 03C3 ALWAYS
>>
>> 03C2 should be NEVER, by the Category C, Casefolding rule.
>>
>> --Ken
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp        
> mailto:duerst at it.aoyama.ac.jp
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update



More information about the Idna-update mailing list