Consensus Call Tranche 8 (Character Adjustments)

Martin Duerst duerst at it.aoyama.ac.jp
Thu Oct 16 06:22:01 CEST 2008


At 20:56 08/10/15, John C Klensin wrote:
>
>
>--On Tuesday, 14 October, 2008 20:04 +0900 Martin Duerst
><duerst at it.aoyama.ac.jp> wrote:
>
>> At 18:25 08/10/12, Vint Cerf wrote:
>>> Consensus Call Tranche 8 (character adjustments)
>>> 
>>> Place your reply here: [YES or NO]
>> 
>> [Vint, if you need to count this only one way,
>> just count it as a NO to be on the safe side.]
>> 
>> 
>> YES for 8.a and 8.b. Despite the transition issues
>> mentioned by Mark, the long discussion on this list has
>> shown that these are the right things to do in the long term.
>> While I'm not aware of any concrete examples of similar
>> cases, I think it would be worthwhile to check with other
>> potentially affected script/language communities.
>> What, for example, about the few final letters in Hebrew?
>
>Or the many initial and final letters in Arabic?  The answer in
>both cases is that these are individual characters and are
>PROTOCOL-VALID.

I have to apologize for picking the Hebrew finals example.
I was on a train, guessing. The answer is that the Hebrew
finals are PROTOCOL-VALID. But that's not the case for
Arabic. In Hebrew, there are just a few final variants,
and they got encoded as first-class letters, and because
Hebrew doesn't have case, they didn't get excluded by
special case folding the way the Greek final sigma has.

However, Arabic has a lot of initial/final/medial/isolated
glyph variants, and therefore these are context-dependent
and created by rendering engines, not encoded as such.
There are encodings of these variants in the compatibility
area, but they should be excluded (DISALLOW) by the fact
that there are compatibility mappings from them to the
base letters.


>What I believe got us into difficulty with
>Eszett and Final Sigma wasn't the positioning issue or an
>alternate shaping one but the intersection between them and the
>case-folding rules.

Yes indeed, I should have thought about that earlier.

>Since, at least as of Unicode 3.2, neither
>of them had upper-case forms and IDNA2003 violated the Unicode
>Standard's advice against using case-folding to actually map
>characters (rather than using it only in comparison but
>retaining the original forms), the only result consistent with
>the general IDNA2003 model was Eszett -> "ss" and Final Sigma ->
>Medial Lower Case Sigma.
>
>Since neither Hebrew nor Arabic (nor any of the other scripts
>that have position-sensitive characters) have case, they cannot
>get into the same problem.
>
>Since we don't do case mapping in IDNA2008, the case folding
>issue does not apply, regardless of what one thinks of that
>operation and its applicability.  Without it, the only issue is
>whether it is worth banning the characters to preserve part of
>the IDNA2003 behavior (or making a major exception and
>preserving the IDNA2003 mapping behavior) for the long term even
>though it is clear that, were the decision being made for the
>first time with the IDNA2008 rules, we would not even be asking
>the question.

Yes indeed. But eszett and final sigma are not the only ones
affected by casing. The data that deals with cases where casing
isn't one-to-one is http://unicode.org/Public/UNIDATA/SpecialCasing.txt.

That includes a lot of data that may be irrelevant for us,
but I think it would be worthwhile to carefully examine it
so that we can fix everything that we need to fix.
The first character that comes to my mind is the lower
dotless I, used for Turkish and Turcic languages.



>> [Just as a hopefully far-fetched example, assume that
>> one day in North Korea, a few Hangul syllables containing some
>> historic Jamos gains crucial importance.]
>
>I'll have more to say about this in another note, but I would
>assume that, were such a situation to arise, North Korea would
>make an appearance in JTC1/SC2 and insist, in the most vigorous
>terms, that code points be allocated to those crucial syllables.

Possible. They, as well as other national bodies, have made
vigorous attempts of one fashion or another, but up to now,
careful discussion and explanations have been able to convince
them that the things they asked for aren't necessary, at least
not in the form they asked for.

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list