Jamo [RE: Consensus Call Tranche 8 (Character Adjustments)]

Martin Duerst duerst at it.aoyama.ac.jp
Fri Oct 17 11:35:55 CEST 2008


At 16:45 08/10/17, Kent Karlsson wrote:
>Michel SUIGNARD wrote:
>> I would like to know where in ISO/IEC 10646 the type of
>> sequence described in 3 is $Bc`OB(Bllowed$Bc`Q(Bto represent such Hangul
>> syllables. Because to the best of my knowledge it is not.
>
>10646  is rather silent on that mattar. But see the Unicode
>standard. In version 5.0 this is discussed in section 3.12,
>"Conjoining jamo behaviour". The key sentence there states:
>
>Unicode> Standard Korean syllable block: A sequence of one or more L
>Unicode> followed by a sequence of one or more V and a sequence of zero >Unicode> or more T, or any other sequence that is canonically equivalent.

Reading through section 3.12 of Unicode 5.0 is somewhat confusing,
because it tries to be very, very general for determining sylable
boundaries (virtually everything goes, as long as you can somehow
immagine that you might make a Korean syllable block out of it,
even if no such block ever has been made), whereas the descriptions
for canonical composition and decomposition are quite limited
(one block <=> two or three Jamo, depending on whether there is
a final consonant (group) or not). As an example, the sequence

U+1101 (GG) U+1100 (G) U+1100 (G) U+1100 (G) U+1161 (A),
summarily written GGGGGA, would be a "Standard Korean syllable
block", too, the same way we would probably expect GGGGGA not to
be broken up by a hyphenation algorithm, whether it looks totally
silly (and in the Korean case, there's no way to display it as
a reasonably-looking syllable block) or not.


> >KIM, Kyongsok wrote:
>> ... each of the following three can represent Hangul syllable GGA:
>> 1) UAC01 (GGA)
>> 2) U1101 (GG), U1161 (A)
>> 3) U1100 (G), U1100 (G), U1161 (A)
>>  - By NFC, 2) U1101 (GG), U1161 (A) will be changed to 1) UAC01 (GGA);
>>  - However, by NFC, 3) U1100 (G), U1100 (G), U1161 (A) will
>> be changed to
>> U1100 (G), UAC00 (GA), which is "different" from 1) UAC01.
>
>This is indeed the correct analysis. I find it very unfortunate
>that U1101 (GG) does not have a *canonical* decomposition mapping
>to <U1100 (G), U1100 (G)> (etc. for all the other multi-letter
>Hangul Jamos). The Hangul script does NOT have a primitive Jamo
>GG. The Hangul GG is, by design, composed of two G Jamos, just
>like Latin GG is composed of two G letters.

Well, it's very easy to take this position indeed. Also, it's
also possible to take the position that U+110F is the result of
adding a stroke to U+1100 (the equivalent, although in this day
and age much less clear, example would be that G is just a
Latin (in the true sense of the old Romans) C with a stroke or
hook added). The Korean script is so well designed that it's
difficult to know where to stop these decompositions.

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list