Tonus

Thu Jan 31 21:07:36 CET 2008

Simon Josefsson said:

> What isn't clear in this thread is that the _reason_ IDNA works the way
> it does is because it chose to use Unicode NFKC for normalization.  That
> isn't something that the Unicode specifications required IDNA to do.  I
> recall discussions of which Unicode normalization form to use in the IPR
> WG, and the eventual choice of NFKC was deliberate.  That may or may not
> have been the right choice, but that's water under the bridge.  So if I
> understand correctly, to fix this issue, we would need to replace NFKC
> with something else in IDNAbis.

Just to clear up one bit of wandering topic here. If "this issue"
is the matching of final sigma, then it has nothing to do with
Unicode normalization form NFKC -- it is the result of the
Unicode casefolding. I think Vint was using "normalization rules"
in a more generic sense of folding two or more strings together
to some normative form for comparison.

If "this issue" is the casing of tonos in Greek, that is also
a casefolding issue, and has nothing to do with Unicode normalization
form NFKC.

The issue that Vaggelis Segredaki has raised is that for one
reason or another, tonos-accented vowels in Greek often are
displayed without tonos when uppercased. This is reminiscent
of the once widespread practice of omitting accents on uppercase
French vowels, in part because of technological constraints.

Whatever the detailed reason in Greek, casefolding of tonos-accented
lowercase vowels together with unaccented uppercase vowels doesn't
happen automatically by Unicode casefolding rules, any more
than it would do so for accented French vowels. The default
case mappings simply map precomposed lowercase tonos-accented
Greek vowels (U+03AC) to precomposed uppercase tonos-accented
Greek vowels (U+0386), etc. And the values for Unicode casefolding
of tonos-accented vowels derive directly from those mappings.

So if Greek users feel that uppercase vowels without tonos
accent and uppercase vowels with tonos accent need to be folded
together, this needs to be handled in some other way, via
preprocessing and/or bundling and the like.

> (Fwiw, for non-DNS purposes of string preparation, the choice of NFKC is
> not so clearly the best choice.)

I agree, but that has nothing to do with either final sigma or
tonos-accented vowels in Greek.

--Ken

> 
> /Simon
> 
> Vint Cerf <vint at google.com> writes:
> 
> > Patrik is correct, Michael. The matching process IS destructive
> > because of the Unicode normalization rules that are applied to allow
> > for matching of the two kinds of sigma.