A UI is certainly free to remap characters beyond what is done by StringPrep, if it is faced with odd input like non-syllabalic Hangul.<br><br>Mark<br><br><div><span class="gmail_quote">On 1/10/07, <b class="gmail_sendername">
Soobok Lee</b> <<a href="mailto:lsb@lsb.org">lsb@lsb.org</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">On Wed, Jan 10, 2007 at 04:21:28PM -0800, Mark Davis wrote:
<br>> I don't think we are anticipating allowing any non-NFKC characters in the<br>> output IDNs. I also tend to agree with Michel and others that input should<br>> also be much more restricted, probably also to only NFKC characters, so that
<br>> the main mappings done from input to output are case mapping and deletion.<br><br>I understand the reason why you think NFKC-normalized strings are safe.<br>But, As i noted below, U+31xx is the only gate of jamo input under KSC5601
<br>and later. If stringprep200x does not allow them nor does not map to u+11xx<br>in context-sensitive way according to Kent Karlsson's suggestion,<br>we lose the only input method for jamo characters ...<br><br>Soobok
<br><br>><br>> Mark<br>><br>> On 1/10/07, Soobok Lee <<a href="mailto:lsb@lsb.org">lsb@lsb.org</a>> wrote:<br>> ><br>> >On Wed, Jan 10, 2007 at 10:22:07AM -0800, Michel Suignard wrote:<br>> >> > Moreover, U+31xx compat jamo letters are the only input method
<br>> >> > for jamo chars under NFC and KSC5601. We have no direct input<br>> >> > method for U+11xx,which is not in KSC5601->UNICODE table.<br>> >> ><br>> >> > So, U+31xx and U+11xx both should be allowed in labels.
<br>> >> ><br>> >> You could possibly argue for having them as input, but because they get<br>> >> normalized into Hangul syllables by NFKC (except for rare old hangul<br>> >> syllables which can only be represented by Jamo or a mix of jamo and
<br>> >> modern hangul syllables), I don't see the point in allowing them in<br>> >> labels. They all get filtered out by NFKC (with the notable exception of<br>> >> Old Hangul which should not belong imo in the IDN name space).
<br>> >> Based on this I don't even think they belong to the input set, because<br>> >> of the confusion. The only difference between the input set and the<br>> >> output set (if any) should be the uppercase forms for bicameral scripts.
<br>> ><br>> >Under NFKC, you are right. But, this IDNAbis may have NFC instead of NFKC,<br>> >because NFKC changes the glyphs like in the case of circled A -> A.<br>> >NFC preserves the glyphs (display) of input characters.
<br>> >My all previous arguments are based on adoption of NFC in IDNAbis.<br>> ><br>> >Soobok<br>> >_______________________________________________<br>> >Idna-update mailing list<br>> >
<a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>> ><a href="http://www.alvestrand.no/mailman/listinfo/idna-update">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>> ><br>
><br>><br>><br>> --<br>> Mark<br></blockquote></div><br><br clear="all"><br>-- <br>Mark