Casefolding Sigma (was: Re: IDNAbis Preprocessing Draft)

Kenneth Whistler kenw at sybase.com
Tue Jan 22 01:24:55 CET 2008


Harald wondered:

> I do wonder what your mapping tables look like for the trailing Greek
> sigma - that's the canonical case of a context dependent case-mapping,
> just as the dotless I is the canonical case of a language dependent
> case-mapping.

There's no particular need to wonder -- the answers are
right there in the data tables. CaseFolding.txt:

03A3; C; 03C3; # GREEK CAPITAL LETTER SIGMA
03C2; C; 03C3; # GREEK SMALL LETTER FINAL SIGMA

In other words, U+03A3 and U+03C2 both case fold to
U+03C3 GREEK SMALL LETTER SIGMA.

And this accounts for why, in the derivation that I posted about
a couple of weeks ago, U+03C3 is in the IDN_Always.txt
table, but U+03A3 and U+03C2 are not, but are in IDN_Never.txt
instead.

draft-faltstrom-idnabis-tables-03.txt has not yet
fully taken case folding stability into account, IMO,
so it has:

03A3 NEVER

but

03C2 ALWAYS
03C3 ALWAYS

03C2 should be NEVER, by the Category C, Casefolding rule.

--Ken



More information about the Idna-update mailing list