Consensus Call Tranche 8 (Character Adjustments)

Martin Duerst duerst at it.aoyama.ac.jp
Thu Oct 16 05:48:04 CEST 2008


At 20:56 08/10/15, John C Klensin wrote:
> >--On Tuesday, 14 October, 2008 22:56 +0200 Mark Davis ><mark at macchiato.com> 
>wrote: > >>> For Korean, there is no >> equivalent because NFC doesn't produce 
>the relevant precomposed >> forms.> And, because it doesn't, our problem is 
>not one of >> confusing similarity (a registry problem) but one of having >> 
>comparisons work correctly (a much deeper issue which we have >> generally 
>dealt with in the protocol, in the analogous case by >> the requirement for 
>NFC. >> John, your first premise, and thus your whole argument is >> 
>incorrect. The combining Jamo *do* form composed characters >> under NFC. 
>Here is an example: >> >> U+1100 
><http://unicode.org/cldr/utility/character.jsp?a=1100> >> ( $Bad(B€ ) HANGUL 
>CHOSEONG KIYEOK >> U+1161 
><http://unicode.org/cldr/utility/character.jsp?a=1161> >> ( $Bae!#(B ) HANGUL 
>JUNGSEONG A >> U+11A8 
><http://unicode.org/cldr/utility/character.jsp?a=11A8> >> ( $Baf%#(B ) HANGUL 
>JONGSEONG KIYEOK >> => >> U+AC01 
><http://unicode.org/cldr/utility/character.jsp?a=AC01> >> ( $Bt2!(B) HANGUL 
>SYLLABLE GAG >

>Mark,
>
>I didn't see that happening when  I ran a few tests of my own,
>but certainly have to defer to your example and experience.

John, if your tests show something different, then please debug them.
It's a bad base for IETF work if the architect of some of the main
changes from a previous edition, the editor of the main documents,
and one of the major channels to experts on the script
in question uses tests that contain bugs
(or whatever it is that makes them go wrong).

>My instinct is still to defer to the national experts and the registry, 
>but, if the [pre]composed characters are consistently formed by NFC, I 
>agree that consistency with decisions made elsewhere would disallow the 
>problematic comparison cases and dictate that we leave this to registry 
>restrictions.


>I am a bit concerned about the hypothetical case that Martin
>raised and my reaction, at least if I correctly understand Unicode's 
>stability rules.    If a few syllables that are now considered archaic (or, 
>if such cases exists, ones that have never been used) abruptly become, to 
>use Martin's term, of crucial importance, would the syllable forms  be 
>allocated code points?

With extremely high likelyhood, NO.


>If so, am I correct in assuming that stability 
>rules would require that NFC would actually decompose the 
>newly-added syllables (presumably composing the individual Jamo to the 
>new syllables would result in an incompatible change to normalization)?

You can call this an incompatible change to normalization, but it's
actually designed the other way round: Any data that already exists
(and is decomposed, because there was no precomposed form) is already
normalized according to these rules. In this very important(*) sense,
the change to normalization is backwards-compatible. This then
creates a strong argument for not encoding anything new
in the first place.

(*) While in the IETF, there is a high awareness of how difficult it
is to change an installed base of software, data isn't usually much
discussed, but it should be obvious that changing an existing base
of data is way tougher. That's why the stability rules for normalization
are they way they are.

>However, I note that prohibiting the Jamo in IDNA would prevent the 
>problem, at the cost of requiring anyone who wants to use a syllable that 
>is not now assigned a code point in a domain name to persuade UTC and >SC2 
>to add that  code point.

That would be exactly NOT what we wanted.

>Unless the national experts and registry can make a much
>stronger case than I can make on their behalf (that ought to be easy 
>for them, but they have not yet been heard from), I think the NFC 
>relationships still shift the balance toward making this a registry 
>restriction.

Yes.


>However, I don't think the answer is quite as obvious and 
>one-sided as your note seems to imply.

At the moment, I think it is. Of course, in the case we get
new information from Korean experts, that may change, but
we can only tell if we have a chance to look at it.

Regards,    Martin.
 


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list