Greek Casefolding sigma

Mark Davis mark.davis at icu-project.org
Mon Mar 31 18:03:46 CEST 2008


I don't think the original Punycode mechanism would work, since I think it
would be an incompatible change in the result compared to strings encoded
under IDNA2003 (especially since, it only allows for 1 bit per character, as
you say).

This is all blue-skying at this point, but the more I think about it, the
more a bit-vector approach looks promising in order to handle the handful
number of peculiar cases (eszett, sigma, i) in a compatible way.

Mark

On Mon, Mar 31, 2008 at 2:09 AM, Markus Scherer <mscherer at google.com> wrote:

> On Sat, Mar 29, 2008 at 7:49 PM, Mark Davis <mark.davis at icu-project.org>
> wrote:
> > The simplest mechanism would be to then take that set of bits and walk
> > through the Punycode, and for each bit in the vector changing each cased
> > letter to uppercase to represent a 1 bit, and leaving it lowercase
> represent
> > a 0 bit.
>
> I recommend against inventing a new mechanism here. Punycode already
> provides an "originally-uppercase" bit per source character. Within
> IDNA, the uppercase information could be extracted before or during
> folding, and then passed into the Punycode-encoding function.
>
> Unfortunately, there is only one bit per character, which as you point
> out is insufficient in some cases for precise representation of the
> original character. I am not sure if there is room to reliably extend
> the mechanism to 2 bits per character while maintaining compabibility
> and not confusing existing implementations that use the predefined
> mechanism.
>
> markus
> --
> Google Internationalization
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080331/9b81527b/attachment.html


More information about the Idna-update mailing list