Greek Casefolding sigma

Markus Scherer mscherer at google.com
Mon Mar 31 11:09:01 CEST 2008


On Sat, Mar 29, 2008 at 7:49 PM, Mark Davis <mark.davis at icu-project.org> wrote:
> The simplest mechanism would be to then take that set of bits and walk
> through the Punycode, and for each bit in the vector changing each cased
> letter to uppercase to represent a 1 bit, and leaving it lowercase represent
> a 0 bit.

I recommend against inventing a new mechanism here. Punycode already
provides an "originally-uppercase" bit per source character. Within
IDNA, the uppercase information could be extracted before or during
folding, and then passed into the Punycode-encoding function.

Unfortunately, there is only one bit per character, which as you point
out is insufficient in some cases for precise representation of the
original character. I am not sure if there is room to reliably extend
the mechanism to 2 bits per character while maintaining compabibility
and not confusing existing implementations that use the predefined
mechanism.

markus
-- 
Google Internationalization


More information about the Idna-update mailing list