Normalization of Hangul

Kent Karlsson kent.karlsson14 at comhem.se
Thu Feb 21 00:59:21 CET 2008


John C Klensin wrote:
> I hope we all understand the subtleties of this thread generally
> and your specific comments above in particular.  However, to a
> casual reader, it sounds very much like "Hangul and the
> surrounding operations are still unstable (in the normal, not
> necessarily Unicode, sense of that term) and that four full
> versions of Unicode after the drastic changes to Hangul
> handling, there still isn't a definition of the processes of
> normalization and comparision other than by a set of ad-hoc
> tables which are not quite complete".  
> 
> Presumably, that isn't what was intended, but...

Just to clarify:

The Hangul script has 17 consonant letters and 11 vowel letters,
plus a small number of variants added later that since have gone
out of use. Apart from the short-lived extra variants (and the
merge of ieung and yesieung), this has been stable for over 550
years, since 1446 (though the spelling of Korean in Hangul has
not been stable, nor has the ordering of the letters, but those
are different matters). The, very deliberate, design of the
script is very elegant.

One also needs a way of determining syllable boundaries
(doubly encoding the consonants, as in Unicode now, is fine).
The Jamo fillers have a function too (for partial syllables).

The rest of the encoded Hangul characters are unnecessary for
representing any text in the Hangul script (modern or historic),
aside from halfwidth (which is really just a display style).
The HANGUL SYLLABLEs have canonical decompositions, so they
are not so bad.

No other script has gotten lots of codes for multi-letter
combinations. "gg" is not eligable for encoding as a single
character, nor is "ou" or "sk", etc. But Hangul has been
given hundreds of extra codes for letter combinations. Most
of these letter combinations only occur in historic texts.
Unfortunately canonical decompositions for the Hangul letter
combinations are missing and cannot now be added. This is
very far from elegant...


	/kent k



More information about the Idna-update mailing list