ALWAYS/MAYBE and CJK (was: Re: IDNAbis Main Open Issues)

Mark Davis mark.davis at icu-project.org
Thu Jan 24 01:48:53 CET 2008


It is not quite as simple as you say, because of multiple words. The rules
for when to change a sigma C into final sigma are in 3.13 Default Case
Algorithms <http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf#G33992>,
Table 3-14, p124.

C is preceded by a sequence consisting
of a cased letter and a case-ignorable
sequence, and C is not followed by a
sequence consisting of a case ignorable
sequence and then a cased letter.

Because the IDN is in NFC, the above formulation can be simplified by
dropping the 'case ignorable sequence' if we are restricted to normal modern
Greek.

C is preceded by a cased letter, and C is not followed by a cased letter.

On that page also are the exact meaning of a 'cased letter' and
'case-ignorable sequence', derived from standard Unicode properties.

If you used those, then in the vast majority of normal Greek text the sigma
would be correct. So the following would display correctly

χαρακτήρες-αντιστοιχώντας.com
<http://%CF%87%CE%B1%CF%81%CE%B1%CE%BA%CF%84%CE%AE%CF%81%CE%B5%CF%82-%CE%B1%CE%BD%CF%84%CE%B9%CF%83%CF%84%CE%BF%CE%B9%CF%87%CF%8E%CE%BD%CF%84%CE%B1%CF%82.com>
// 1. display

As I already noted, IDNs are often not "normal" text because of the use of
words run together. So it would fail in that case. For example, the
following wouldn't work

χαρακτήρεσαντιστοιχώντας.com<http://%CF%87%CE%B1%CF%81%CE%B1%CE%BA%CF%84%CE%AE%CF%81%CE%B5%CF%83%CE%B1%CE%BD%CF%84%CE%B9%CF%83%CF%84%CE%BF%CE%B9%CF%87%CF%8E%CE%BD%CF%84%CE%B1%CF%82.com>
// display 2
χαρακτήρεςαντιστοιχώντας.com<http://%CF%87%CE%B1%CF%81%CE%B1%CE%BA%CF%84%CE%AE%CF%81%CE%B5%CF%82%CE%B1%CE%BD%CF%84%CE%B9%CF%83%CF%84%CE%BF%CE%B9%CF%87%CF%8E%CE%BD%CF%84%CE%B1%CF%82.com>
// desired 3

For IDNA2003, because people do have the choice of an optional hyphen for
disambuguation when registering the names, this is probably reasonable as a
display step. That is, it will help in many cases, and shouldn't hurt in
any. It can be implemented right now in browsers or other user agents --
without any problem -- since the input of the resulting display forms will
continue to work (because of StringPrep).

For IDNAbis, changing the wire form would introduce compatibility problems,
already mentioned.

Mark

On Jan 24, 2008 5:33 AM, Gervase Markham <gerv at mozilla.org> wrote:

> Michael Everson wrote:
> > I was cheered by what Gervase said about displaying final sigma. I hope
> > that the Mozilla and Safari and IE get together and agree on a method to
> > do the right thing for the Greeks.
>
> I can't bind any of those organisations, even my own, but if it turns
> out that the final-sigma problem is intractable at the protocol level
> (and perhaps esszet as well, I don't know), and it's clear that we're
> not opening an enormous can of worms which will lead to proliferating
> special-case code, then if it helps us get this done quicker, the
> IDN200x standard can punt on the issues and we can attempt to reach a
> consensus higher up.
>
> It seems to me that final sigma's a fairly easy case, as the rules are
> simple. On lookup, s/<final sigma>/<normal sigma>/. On display,
> s/<normal sigma at end of label>/<final sigma>/. Other troublesome edge
> cases may not be so easy.
>
> Gerv
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080124/5c06fbe4/attachment.html


More information about the Idna-update mailing list