hangul jamo and unicode 5.0 vs 3.1 NFC

Soobok Lee lsb at lsb.org
Wed Jan 3 05:30:43 CET 2007


On Tue, Jan 02, 2007 at 02:22:24PM +0900, Martin Duerst wrote:
> Hello Soobok,
> 
> While there were several very small error corrections for NFC
> (and thus NFKC) between Unicode 3.0 and Unicode 5.0, it is in
> no way possible that something that was defined to work only
> halfway in NFKC in Unicode 3.x
>   (your example L + V + T => LV + T below)
> is now working fully in NFC (which is in a way a subset of NFKC)
> in Unicode 5.0
>   (your example L + V + T => LV + T => LVT)
> 
> Please check your sources again.

You are right. I misread the document. 
LVT  is always composed completely in unicode 3.xx and unicode 5.x.

Hangulchar draft described some problem that originate from 
NFKC which is not enough 
context-sensitive to deal with semantically/contextually final consonant etc.
It is converted into L+V+L instead of L+V+T for contextually 
final consonant T.

So we always fail to input some archaic hangul syllables from KSC5601 and
convert them into NKFCed unicode string. We have half-way composed hangul 
syllables if we input those syllables from KSC5601 locale. 

To solve this, compat mapping should be context-sensitive to determine
correctly whether compat jamo consonant is converted into L or T. 
But I can't suggest this here.

NFC's composition algorithm has _NO problem_ at all even since Unicode 3.0.

So, UNICODE 5.0 NFC does not pose any new compatibility problems wrt
jamo sequences with UNICODE 3.0. and that relieve me.  Sorry for my misreading.

Thanks

Soobok




More information about the Idna-update mailing list