Jamo [RE: Consensus Call Tranche 8 (Character Adjustments)]

Kent Karlsson kent.karlsson14 at comhem.se
Fri Oct 17 18:07:24 CEST 2008


Martin Dürst wrote:
> >Unicode> Standard Korean syllable block: A sequence of one or more L
> >Unicode> followed by a sequence of one or more V and a
> sequence of zero >Unicode> or more T, or any other sequence
> that is canonically equivalent.
>
> Reading through section 3.12 of Unicode 5.0 is somewhat confusing,
> because it tries to be very, very general for determining sylable
> boundaries (virtually everything goes, as long as you can somehow
> immagine that you might make a Korean syllable block out of it,
> even if no such block ever has been made),

"no such block ever has been made" is not a consideration for an
alphabetic script, like Hangul. But there are practical limitations
of size in this case, since one tries (in display/print) to fit all
letters of a syllable into a graphical block the size of an ideograph. 
Some "syllables", like GGGGGA in Hangul, would simply be too crammed
(unless the block size was gigantic). On the other hand, GGGGGA is
not a very reasonable "syllable" in text representing real (and
reasonably spelled) words.

> whereas the descriptions
> for canonical composition and decomposition are quite limited
> (one block <=> two or three Jamo, depending on whether there is
> a final consonant (group) or not).

Yes, but that is only a subset of the possible (and reasonable)
syllables that can be written in Hangul. It only covers (a superset
of) what occurs in "modern Hangul" (modulo the multiletter issue),
but has really nothing to do with how the Hangul script is constructed.

> As an example, the sequence
>
> U+1101 (GG) U+1100 (G) U+1100 (G) U+1100 (G) U+1161 (A),
> summarily written GGGGGA, would be a "Standard Korean syllable
> block", too, the same way we would probably expect GGGGGA not to
> be broken up by a hyphenation algorithm, whether it looks totally
> silly (and in the Korean case, there's no way to display it as
> a reasonably-looking syllable block) or not.
>
>
> > >KIM, Kyongsok wrote:
> >> ... each of the following three can represent Hangul syllable GGA:
> >> 1) UAC01 (GGA)
> >> 2) U1101 (GG), U1161 (A)
> >> 3) U1100 (G), U1100 (G), U1161 (A)
> >>  - By NFC, 2) U1101 (GG), U1161 (A) will be changed to 1) UAC01 (GGA);
> >>  - However, by NFC, 3) U1100 (G), U1100 (G), U1161 (A) will
> >> be changed to
> >> U1100 (G), UAC00 (GA), which is "different" from 1) UAC01.
> >
> >This is indeed the correct analysis. I find it very unfortunate
> >that U1101 (GG) does not have a *canonical* decomposition mapping
> >to <U1100 (G), U1100 (G)> (etc. for all the other multi-letter
> >Hangul Jamos). The Hangul script does NOT have a primitive Jamo
> >GG. The Hangul GG is, by design, composed of two G Jamos, just
> >like Latin GG is composed of two G letters.
>
> Well, it's very easy to take this position indeed.

And is also how the Hangul script was actually designed.

> Also, it's
> also possible to take the position that U+110F is the result of
> adding a stroke to U+1100 

That is not how the Hangul script was designed, and is thus a
misinterpretation.

> (the equivalent, although in this day
> and age much less clear, example would be that G is just a
> Latin (in the true sense of the old Romans) C with a stroke or
> hook added). The Korean script is so well designed that it's
> difficult to know where to stop these decompositions.

While that parallel holds for the strokes (dots originally, which
was a bit unfortunate graphically) for the vowels (for instance
Hangul O is NOT a composition of EU and ARAEA), and certain of the
consonants (e.g. THIEUTH is a primitive letter that happens to have
one more stroke than TIKEUT) it does not hold for the doubled consonants
(like SSANGKIYEOK) nor for any of the other multiletter Jamos (like
Hangul E *is* a composition of Hangul EO and Hangul I).

The design documents, both editions, are quite clear on these matters.
So there is no reason to guess how the Hangul letters, and letter
combinations, are constructed. While one may find the philosophy
for the graphical design of the individual letters to sometimes be
a bit doubtful, esp. for the vowels, it is clear what are individual
letters, and what are compositions of letters.

See the original design document, translated to English in

	The Korean Language, Ho-Min Sohn, Cambridge University Press, 1999,
	ISBN 0-521-36123-0 or 0-521-36943-6. (Section 6.3 gives a translation
	to English of the 1444 design document for the Hangul alphabet.)

Also (facsimile only, no translation), in

	A history of Korean Alphabet and Movable Types, Ministry of Culture
	and Information, Republic of Korea, 1970. (Part 1 reproduces the 1444
	official design document for the Hangul alphabet.)

The revised and extended Hangul design document, reproduced,
translated to English (and analysed) in:

	The Korean alphabet of 1446 – Expositions, OPA, The visible speech
	sounds, Annotated translation, Future applicability; Hwun Min Ceng
	Um, Sek Yen Kim-Cho, Humanity Books and AC Press, New York, 2002,
	ISBN 89-428-1587-1. (Reproduces, translates and analyses (in English)
	the 1446 official design document for the Hangul alphabet.)

The extended document from 1446 introduces the kapyeoun- combinations as
compositions with IEUNG at the end. It is clear that the little circle
below is really a IEUNG, not something else.

This book from 2002 also introduces an interesting possible extension
to Hangul, putting "annotations" on the (primitive) Hangul letters
within syllable blocks, for use as a phonetic notation.

Also of relevance:

	The Korean Alphabet, its history and structure, ed. Young-Key
	Kim-Renaud, University of Hawai'i Press, 1997, ISBN 0-824-81989-6.



		/kent k



More information about the Idna-update mailing list