Consensus Call Tranche 8 (Character Adjustments)

John C Klensin klensin at jck.com
Wed Oct 15 13:56:38 CEST 2008



--On Tuesday, 14 October, 2008 22:56 +0200 Mark Davis
<mark at macchiato.com> wrote:

>> For Korean, there is no
> equivalent because NFC doesn't produce the relevant precomposed
> forms.> And, because it doesn't, our problem is not one of
> confusing similarity (a registry problem) but one of having
> comparisons work correctly (a much deeper issue which we have
> generally dealt with in the protocol, in the analogous case by
> the requirement for NFC.
> John, your first premise, and thus your whole argument is
> incorrect. The combining Jamo *do* form composed characters
> under NFC. Here is an example:
> 
> U+1100 <http://unicode.org/cldr/utility/character.jsp?a=1100>
> ( ᄀ ) HANGUL CHOSEONG KIYEOK
> U+1161 <http://unicode.org/cldr/utility/character.jsp?a=1161>
> ( ᅡ ) HANGUL JUNGSEONG A
> U+11A8 <http://unicode.org/cldr/utility/character.jsp?a=11A8>
> ( ᆨ ) HANGUL JONGSEONG KIYEOK
> =>
> U+AC01 <http://unicode.org/cldr/utility/character.jsp?a=AC01>
> ( 각 ) HANGUL SYLLABLE GAG

Mark,

I didn't see that happening when  I ran a few tests of my own,
but certainly have to defer to your example and experience.   My
instinct is still to defer to the national experts and the
registry, but, if the [pre]composed characters are consistently
formed by NFC, I agree that consistency with decisions made
elsewhere would disallow the problematic comparison cases and
dictate that we leave this to registry restrictions.

I am a bit concerned about the hypothetical case that Martin
raised and my reaction, at least if I correctly understand
Unicode's stability rules.    If a few syllables that are now
considered archaic (or, if such cases exists, ones that have
never been used) abruptly become, to use Martin's term, of
crucial importance, would the syllable forms  be allocated code
points?   If so, am I correct in assuming that stability rules
would require that NFC would actually decompose the newly-added
syllables (presumably composing the individual Jamo to the new
syllables would result in an incompatible change to
normalization)?  That isn't an attractive answer because it
makes the behavior dependent on when a particular character code
point is added to Unicode.  The other alternatives are certainly
worse for general applications of Unicode.  However, I note that
prohibiting the Jamo in IDNA would prevent the problem, at the
cost of requiring anyone who wants to use a syllable that is not
now assigned a code point in a domain name to persuade UTC and
SC2 to add that  code point.

Unless the national experts and registry can make a much
stronger case than I can make on their behalf (that ought to be
easy for them, but they have not yet been heard from), I think
the NFC relationships still shift the balance toward making this
a registry restriction.  However, I don't think the answer is
quite as obvious and one-sided as your note seems to imply.

      john



More information about the Idna-update mailing list