[Almost OT] Re: Hangul jamo issues - are jamo sequences legitimate?

Mark Davis mark.davis at icu-project.org
Thu Jan 11 01:38:34 CET 2007


A UI is certainly free to remap characters beyond what is done by
StringPrep, if it is faced with odd input like non-syllabalic Hangul.

Mark

On 1/10/07, Soobok Lee <lsb at lsb.org> wrote:
>
> On Wed, Jan 10, 2007 at 04:21:28PM -0800, Mark Davis wrote:
> > I don't think we are anticipating allowing any non-NFKC characters in
> the
> > output IDNs. I also tend to agree with Michel and others that input
> should
> > also be much more restricted, probably also to only NFKC characters, so
> that
> > the main mappings done from input to output are case mapping and
> deletion.
>
> I understand the reason why you think NFKC-normalized strings are safe.
> But, As i noted below, U+31xx is the only gate of jamo input under KSC5601
> and later. If stringprep200x does not allow them nor does not map to
> u+11xx
> in context-sensitive way according to Kent Karlsson's suggestion,
> we lose the only input method for jamo characters ...
>
> Soobok
>
> >
> > Mark
> >
> > On 1/10/07, Soobok Lee <lsb at lsb.org> wrote:
> > >
> > >On Wed, Jan 10, 2007 at 10:22:07AM -0800, Michel Suignard wrote:
> > >> > Moreover, U+31xx compat jamo letters are the only input method
> > >> > for jamo chars under NFC and KSC5601. We have no direct input
> > >> > method for U+11xx,which is not in KSC5601->UNICODE table.
> > >> >
> > >> > So, U+31xx and U+11xx both should be allowed in labels.
> > >> >
> > >> You could possibly argue for having them as input, but because they
> get
> > >> normalized into Hangul syllables by NFKC (except for rare old hangul
> > >> syllables which can only be represented by Jamo or a mix of jamo and
> > >> modern hangul syllables), I don't see the point in allowing them in
> > >> labels. They all get filtered out by NFKC (with the notable exception
> of
> > >> Old Hangul which should not belong imo in the IDN name space).
> > >> Based on this I don't even think they belong to the input set,
> because
> > >> of the confusion. The only difference between the input set and the
> > >> output set (if any) should be the uppercase forms for bicameral
> scripts.
> > >
> > >Under NFKC, you are right. But, this IDNAbis may have NFC instead of
> NFKC,
> > >because NFKC changes the glyphs like in the case of circled A -> A.
> > >NFC preserves the glyphs (display) of input characters.
> > >My all previous arguments are based on adoption of NFC in IDNAbis.
> > >
> > >Soobok
> > >_______________________________________________
> > >Idna-update mailing list
> > >Idna-update at alvestrand.no
> > >http://www.alvestrand.no/mailman/listinfo/idna-update
> > >
> >
> >
> >
> > --
> > Mark
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070110/d23ca6a8/attachment.html


More information about the Idna-update mailing list