AW: AW: sharp s (Eszett)

Mon Mar 17 05:49:57 CET 2008

At 00:05 08/03/12, John C Klensin wrote:

>Our general rule should be to avoid information loss.  While we
>assume both for historical reasons and due to the symmetry of
>case folding them, that there is no information loss in
>transforming ordinary Latin upper case characters into lower
>case ones, when we start talking about non-reversible mappings
>and and spelling rules, we should keep the characters separate
>and registerable, rather than taking excursions into the
>peculiarities of casefolding (which the Unicode book
>acknowledges loses information and recommends against precisely
>the way we are using it if that can be avoided).  

I agree that the concept of "information loss" is an important
and useful criterion. But I'm a bit wary to make it a (potentially
close to absolute) rule. 

>Now, if we can agree on that principle, we need to examine the
>set of characters that are transformed in non-obvious ways by
>case folding.   For those that are like this, we have to figure
>out how to implement the principle, which may require some
>additions to the exception list.

The next example where to test this approach would be the issue
of the (Turkish,...) dotless i. My guess is that things would
work out fine (i.e. the concept of information loss would show
the desirability for having both dot-ful and dot-less 'i').

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp      mailto:duerst at it.aoyama.ac.jp