Greek Casefolding sigma

Tue Apr 1 00:06:55 CEST 2008

Sorry must not have read carefully! 

----- Original Message -----
From: mark.edward.davis at gmail.com <mark.edward.davis at gmail.com>
To: Vint Cerf
Cc: Markus Scherer; klensin at jck.com <klensin at jck.com>; patrik at frobbit.se <patrik at frobbit.se>; panaretou.sotiris at ucy.ac.cy <panaretou.sotiris at ucy.ac.cy>; segred at ics.forth.gr <segred at ics.forth.gr>; idna-update at alvestrand.no <idna-update at alvestrand.no>
Sent: Mon Mar 31 14:16:00 2008
Subject: Re: Greek Casefolding sigma

I'm not talking about modifying PunyCode -- we've all agreed that that's out of scope for the charter. What I was thinking about was postprocessing the Punycode result to use the case of the letters in the Punycoded string to carry information about the case of the original string. This is just blue-skying -- nothing should distract from the main order of business, which is getting the charter done.

Mark

On Mon, Mar 31, 2008 at 1:10 PM, Vint Cerf <vint at google.com> wrote:

	I suspect we will spiral into a nonconvergent path if we start modifying punycode. it is out of bounds in any case for the proposed working group chartedm

	----- Original Message -----
	From: idna-update-bounces at alvestrand.no <idna-update-bounces at alvestrand.no>
	To: Mark Davis <mark.davis at icu-project.org>
	Cc: Sotiris Panaretou <panaretou.sotiris at ucy.ac.cy>; Patrik Fältström <patrik at frobbit.se>; John C Klensin <klensin at jck.com>; Vaggelis Segredakis <segred at ics.forth.gr>; idna-update at alvestrand.no <idna-update at alvestrand.no>
	Sent: Mon Mar 31 02:09:01 2008
	Subject: Re: Greek Casefolding sigma

	On Sat, Mar 29, 2008 at 7:49 PM, Mark Davis <mark.davis at icu-project.org> wrote:
	> The simplest mechanism would be to then take that set of bits and walk
	> through the Punycode, and for each bit in the vector changing each cased
	> letter to uppercase to represent a 1 bit, and leaving it lowercase represent
	> a 0 bit.

	I recommend against inventing a new mechanism here. Punycode already
	provides an "originally-uppercase" bit per source character. Within
	IDNA, the uppercase information could be extracted before or during
	folding, and then passed into the Punycode-encoding function.

	Unfortunately, there is only one bit per character, which as you point
	out is insufficient in some cases for precise representation of the
	original character. I am not sure if there is room to reliably extend
	the mechanism to 2 bits per character while maintaining compabibility
	and not confusing existing implementations that use the predefined
	mechanism.

	markus
	--
	Google Internationalization

	_______________________________________________
	Idna-update mailing list
	Idna-update at alvestrand.no
	http://www.alvestrand.no/mailman/listinfo/idna-update

	_______________________________________________
	Idna-update mailing list
	Idna-update at alvestrand.no
	http://www.alvestrand.no/mailman/listinfo/idna-update

-- 
Mark