[Almost OT] Re: Hangul jamo issues - are jamo sequenceslegitimate?

Kent Karlsson kent.karlsson14 at comhem.se
Thu Jan 11 11:37:56 CET 2007


Michel Suignard wrote:
> You could possibly argue for having them as input, but 
> because they get
> normalized into Hangul syllables by NFKC (except for rare old hangul
> syllables which can only be represented by Jamo or a mix of jamo and
> modern hangul syllables), I don't see the point in allowing them in
> labels. They all get filtered out by NFKC (with the notable 

1) This thread has been a bit confusing in that the term "jamo" has been
used for what Unicode calls "Hangul compatibility letter", not just the
conjoining Hangul jamo. NFKC does *not* "filter away" *any* conjoining
jamo (it does join some of them to precomposed Hangul syllables though,
but in a correct implementation of Hangul, that does not change the
display of the string at all).

2) NFKC, while quite ok in the IDN context otherwise, does a really bad
job for the Hangul compatibility letters (see the paper I wrote on the
subject, that I've referenced before).

3) In most contexts (IMO including IDN) Hangul compatibility letters
are best seen as equivalent to (partial) Hangul syllables constructed
from conjoining Jamo, including the conjoining jamo fillers. E.g.
U+3131 -> <U+1100, U+1160> (in a correct implementation of
Hangul, including the conjoining Jamo, these two representations
display identically). Ideally, those should have been the Unicode
decomposition mappings for the Hangul compatibility letters.

		/kent k



More information about the Idna-update mailing list