sounds to me like the MVALID idea needs some tuning. I think the point about substrings that John makes about operating on characters or full labels sounds like a pretty critical choice. Can we make this work on a purely character-by-character mapping basis? <br>
<br><br><br><div class="gmail_quote">2009/4/11 John C Klensin <span dir="ltr"><<a href="mailto:klensin@jck.com">klensin@jck.com</a>></span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
<br>
--On Saturday, April 11, 2009 09:14 -0700 Mark Davis<br>
<div class="im"><<a href="mailto:mark@macchiato.com">mark@macchiato.com</a>> wrote:<br>
<br>
> We are thinking along very similar lines. Yes, I think what we<br>
> want to do is have the definition of MVALID as those<br>
> characters that are subject to IDNA2003-style mapping. I think<br>
> it is best to call it a slightly different name, since it is<br>
> those characters subject to mapping, and we don't want people<br>
> to think it is all those characters valid in an M-Label. I'll<br>
> use the working name MSUBJECT.<br>
> The process in Protocol would be along the following lines.<br>
><br>
> 1. For any substring of the input whose characters are all in<br>
> MSUBJECT,<br>
<br>
</div>I think this has MSUBJECT as a superset of PVALID, not as<br>
characters that are actually mapped under IDNA2003-like rules.<br>
I don't necessarily have a problem with that, but we need to be<br>
very, very, clear... especially since I'm not at all sure what<br>
"IDNA2003-style" covers.<br>
<br>
If my supposition about the subset relationship is not correct,<br>
then I'm still not sure about the implications of selecting a<br>
substring. It would seem to introduce additional confusion; I<br>
think it would be much more sensible to operate on either<br>
characters or on full labels rather than dividing labels into<br>
substrings that have to be recombined.<br>
<div class="im"><br>
> convert that substring via the following mapping,<br>
> and replace in the source.<br>
><br>
> substring = toNFKC(removeDI(toCaseFold(toNFKC(substring))))<br>
><br>
> // the "removeDI step would be dropped if we decide not to<br>
> remove them<br>
<br>
</div>Note another peculiarity of this rule. If we decide to side<br>
with the language authorities rather than the "having accepted<br>
the mistake in IDNA2003, we would rather live with it than<br>
change" position of the registries and allow, e.g., Eszett, then<br>
applying "toCaseFold(toNFKC(..." to a substring containing<br>
Eszett but no other characters that are non-PVALID skips this<br>
step and uses Eszett in the substring to be looked up. But one<br>
that contains at least one character that requires mapping would<br>
result in a final substring that contains "ss". I believe that<br>
a user would find that even more astonishing than begin forced<br>
to use lower-case.<br>
<br>
Similarly, while we have already decided to DISALLOW Hangul<br>
Jamo, this rule would allow those combinations of Jamo that NFKC<br>
maps into Hangul syllables while not allowing those combinations<br>
that do not. I'm not nearly familiar enough with Korean usage<br>
to know how problematic that would be in practice, but it is not<br>
what I think we agreed to.<br>
<br>
The latter is one of the reasons why Protocol now says "must be<br>
in NFC form" rather than "apply NFC".<br>
<div class="im"><br>
> 2. Transform the entire string via NFC.<br>
><br>
> // we need to do this to make sure the result is NFC, because<br>
> of possible interactions between characters that are inside<br>
> and outside MSUBJECT.<br>
<br>
</div>I agree, but, again, decomposing labels into substring<br>
components and then recombining them seems exceptionally likely<br>
to get us into surprises -- and implementations into states of<br>
confusion.<br>
<div class="im"><br>
> 3. Proceed with the rest of Protocol<br>
</div>>...<br>
<font color="#888888"><br>
john<br>
</font><div><div></div><div class="h5"><br>
_______________________________________________<br>
Idna-update mailing list<br>
<a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>
<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
</div></div></blockquote></div><br>