Normalization of Hangul

Mark Davis mark.davis at icu-project.org
Thu Feb 21 02:21:16 CET 2008


I think this conversation is muddying the waters thoroughly.

When normalization was defined, it was clear that it would not do everything
that everyone could possibly have wanted. Speaking to John's "casual
reader", what Kent is talking about is something that Kent has raised
repeatedly before as something he would have liked for normalization to have
done.

But it wasn't done, and won't be done, and has no impact on the stability or
utility of normalization.

Mark

On Wed, Feb 20, 2008 at 3:59 PM, Kent Karlsson <kent.karlsson14 at comhem.se>
wrote:

> John C Klensin wrote:
> > I hope we all understand the subtleties of this thread generally
> > and your specific comments above in particular.  However, to a
> > casual reader, it sounds very much like "Hangul and the
> > surrounding operations are still unstable (in the normal, not
> > necessarily Unicode, sense of that term) and that four full
> > versions of Unicode after the drastic changes to Hangul
> > handling, there still isn't a definition of the processes of
> > normalization and comparision other than by a set of ad-hoc
> > tables which are not quite complete".
> >
> > Presumably, that isn't what was intended, but...
>
> Just to clarify:
>
> The Hangul script has 17 consonant letters and 11 vowel letters,
> plus a small number of variants added later that since have gone
> out of use. Apart from the short-lived extra variants (and the
> merge of ieung and yesieung), this has been stable for over 550
> years, since 1446 (though the spelling of Korean in Hangul has
> not been stable, nor has the ordering of the letters, but those
> are different matters). The, very deliberate, design of the
> script is very elegant.
>
> One also needs a way of determining syllable boundaries
> (doubly encoding the consonants, as in Unicode now, is fine).
> The Jamo fillers have a function too (for partial syllables).
>
> The rest of the encoded Hangul characters are unnecessary for
> representing any text in the Hangul script (modern or historic),
> aside from halfwidth (which is really just a display style).
> The HANGUL SYLLABLEs have canonical decompositions, so they
> are not so bad.
>
> No other script has gotten lots of codes for multi-letter
> combinations. "gg" is not eligable for encoding as a single
> character, nor is "ou" or "sk", etc. But Hangul has been
> given hundreds of extra codes for letter combinations. Most
> of these letter combinations only occur in historic texts.
> Unfortunately canonical decompositions for the Hangul letter
> combinations are missing and cannot now be added. This is
> very far from elegant...
>
>
>        /kent k
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080220/ff6156a0/attachment-0001.html


More information about the Idna-update mailing list