Greek Casefolding sigma
Vint Cerf
vint at google.com
Tue Apr 1 00:06:55 CEST 2008
Sorry must not have read carefully!
----- Original Message -----
From: mark.edward.davis at gmail.com <mark.edward.davis at gmail.com>
To: Vint Cerf
Cc: Markus Scherer; klensin at jck.com <klensin at jck.com>; patrik at frobbit.se <patrik at frobbit.se>; panaretou.sotiris at ucy.ac.cy <panaretou.sotiris at ucy.ac.cy>; segred at ics.forth.gr <segred at ics.forth.gr>; idna-update at alvestrand.no <idna-update at alvestrand.no>
Sent: Mon Mar 31 14:16:00 2008
Subject: Re: Greek Casefolding sigma
I'm not talking about modifying PunyCode -- we've all agreed that that's out of scope for the charter. What I was thinking about was postprocessing the Punycode result to use the case of the letters in the Punycoded string to carry information about the case of the original string. This is just blue-skying -- nothing should distract from the main order of business, which is getting the charter done.
Mark
On Mon, Mar 31, 2008 at 1:10 PM, Vint Cerf <vint at google.com> wrote:
I suspect we will spiral into a nonconvergent path if we start modifying punycode. it is out of bounds in any case for the proposed working group chartedm
----- Original Message -----
From: idna-update-bounces at alvestrand.no <idna-update-bounces at alvestrand.no>
To: Mark Davis <mark.davis at icu-project.org>
Cc: Sotiris Panaretou <panaretou.sotiris at ucy.ac.cy>; Patrik Fältström <patrik at frobbit.se>; John C Klensin <klensin at jck.com>; Vaggelis Segredakis <segred at ics.forth.gr>; idna-update at alvestrand.no <idna-update at alvestrand.no>
Sent: Mon Mar 31 02:09:01 2008
Subject: Re: Greek Casefolding sigma
On Sat, Mar 29, 2008 at 7:49 PM, Mark Davis <mark.davis at icu-project.org> wrote:
> The simplest mechanism would be to then take that set of bits and walk
> through the Punycode, and for each bit in the vector changing each cased
> letter to uppercase to represent a 1 bit, and leaving it lowercase represent
> a 0 bit.
I recommend against inventing a new mechanism here. Punycode already
provides an "originally-uppercase" bit per source character. Within
IDNA, the uppercase information could be extracted before or during
folding, and then passed into the Punycode-encoding function.
Unfortunately, there is only one bit per character, which as you point
out is insufficient in some cases for precise representation of the
original character. I am not sure if there is room to reliably extend
the mechanism to 2 bits per character while maintaining compabibility
and not confusing existing implementations that use the predefined
mechanism.
markus
--
Google Internationalization
_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
--
Mark
More information about the Idna-update
mailing list