ALWAYS/MAYBE and CJK (was: Re: IDNAbis Main Open Issues)
mark.davis at icu-project.org
Thu Jan 24 01:48:53 CET 2008
It is not quite as simple as you say, because of multiple words. The rules
for when to change a sigma C into final sigma are in 3.13 Default Case
Table 3-14, p124.
C is preceded by a sequence consisting
of a cased letter and a case-ignorable
sequence, and C is not followed by a
sequence consisting of a case ignorable
sequence and then a cased letter.
Because the IDN is in NFC, the above formulation can be simplified by
dropping the 'case ignorable sequence' if we are restricted to normal modern
C is preceded by a cased letter, and C is not followed by a cased letter.
On that page also are the exact meaning of a 'cased letter' and
'case-ignorable sequence', derived from standard Unicode properties.
If you used those, then in the vast majority of normal Greek text the sigma
would be correct. So the following would display correctly
// 1. display
As I already noted, IDNs are often not "normal" text because of the use of
words run together. So it would fail in that case. For example, the
following wouldn't work
// display 2
// desired 3
For IDNA2003, because people do have the choice of an optional hyphen for
disambuguation when registering the names, this is probably reasonable as a
display step. That is, it will help in many cases, and shouldn't hurt in
any. It can be implemented right now in browsers or other user agents --
without any problem -- since the input of the resulting display forms will
continue to work (because of StringPrep).
For IDNAbis, changing the wire form would introduce compatibility problems,
On Jan 24, 2008 5:33 AM, Gervase Markham <gerv at mozilla.org> wrote:
> Michael Everson wrote:
> > I was cheered by what Gervase said about displaying final sigma. I hope
> > that the Mozilla and Safari and IE get together and agree on a method to
> > do the right thing for the Greeks.
> I can't bind any of those organisations, even my own, but if it turns
> out that the final-sigma problem is intractable at the protocol level
> (and perhaps esszet as well, I don't know), and it's clear that we're
> not opening an enormous can of worms which will lead to proliferating
> special-case code, then if it helps us get this done quicker, the
> IDN200x standard can punt on the issues and we can attempt to reach a
> consensus higher up.
> It seems to me that final sigma's a fairly easy case, as the rules are
> simple. On lookup, s/<final sigma>/<normal sigma>/. On display,
> s/<normal sigma at end of label>/<final sigma>/. Other troublesome edge
> cases may not be so easy.
> Idna-update mailing list
> Idna-update at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update