Possible definition for MVALID and a mapping table

Sun Apr 12 02:51:03 CEST 2009

--On Sunday, April 12, 2009 02:24 +0200 Patrik Fältström
<patrik at frobbit.se> wrote:

> On 12 apr 2009, at 02.15, John C Klensin wrote:
> 
>> (3) Of the case-related operations, only toLowerCase is used
>> to form mapping functions.
> 
> I would like you to be more specific here (as it took me a
> while to understand what these things means if you look in the
> Unicode tables).
> 
> What you talk about are cases C and S in CaseFolding.txt? Or
> just C?
> 
> Today, when talking about CaseFold, I use C+F for the
> stability tests in tables.

According to TUS 5.0, Section 5.18, toLowerCase is an entirely
separate operation from toCaseFold.  toCaseFold converts a
string to upper case and then back to lower case and contains a
collection of special cases for character that lack one or the
other.  It is those special cases that I'm trying to avoid
because they are not, in general, going to be consistently
user-intuitive.

My understanding is that, if one does "simple case conversion",
as described in TUS 5.0 Section 3.13, and does not apply the
context-dependent casing operations of Table 3-14, then one can
do toLowerCase based entirely on UnicodeData.txt (i.e., without
getting anywhere near CaseFolding.txt).  I assume Mark or Ken
will correct me on that inference if I am not correct.

best,
    john