[Almost OT] Re: Hangul jamo issues - are jamo sequences legitimate?

Soobok Lee lsb at lsb.org
Thu Jan 11 01:33:26 CET 2007


On Wed, Jan 10, 2007 at 04:21:28PM -0800, Mark Davis wrote:
> I don't think we are anticipating allowing any non-NFKC characters in the
> output IDNs. I also tend to agree with Michel and others that input should
> also be much more restricted, probably also to only NFKC characters, so that
> the main mappings done from input to output are case mapping and deletion.

I understand the reason why you think NFKC-normalized strings are safe.
But, As i noted below, U+31xx is the only gate of jamo input under KSC5601
and later. If stringprep200x does not allow them nor does not map to u+11xx
in context-sensitive way according to Kent Karlsson's suggestion, 
we lose the only input method for jamo characters ...

Soobok

> 
> Mark
> 
> On 1/10/07, Soobok Lee <lsb at lsb.org> wrote:
> >
> >On Wed, Jan 10, 2007 at 10:22:07AM -0800, Michel Suignard wrote:
> >> > Moreover, U+31xx compat jamo letters are the only input method
> >> > for jamo chars under NFC and KSC5601. We have no direct input
> >> > method for U+11xx,which is not in KSC5601->UNICODE table.
> >> >
> >> > So, U+31xx and U+11xx both should be allowed in labels.
> >> >
> >> You could possibly argue for having them as input, but because they get
> >> normalized into Hangul syllables by NFKC (except for rare old hangul
> >> syllables which can only be represented by Jamo or a mix of jamo and
> >> modern hangul syllables), I don't see the point in allowing them in
> >> labels. They all get filtered out by NFKC (with the notable exception of
> >> Old Hangul which should not belong imo in the IDN name space).
> >> Based on this I don't even think they belong to the input set, because
> >> of the confusion. The only difference between the input set and the
> >> output set (if any) should be the uppercase forms for bicameral scripts.
> >
> >Under NFKC, you are right. But, this IDNAbis may have NFC instead of NFKC,
> >because NFKC changes the glyphs like in the case of circled A -> A.
> >NFC preserves the glyphs (display) of input characters.
> >My all previous arguments are based on adoption of NFC in IDNAbis.
> >
> >Soobok
> >_______________________________________________
> >Idna-update mailing list
> >Idna-update at alvestrand.no
> >http://www.alvestrand.no/mailman/listinfo/idna-update
> >
> 
> 
> 
> -- 
> Mark


More information about the Idna-update mailing list