<br clear="all">Mark<br>

<br><br><div class="gmail_quote">On Mon, Jun 29, 2009 at 10:55, John C Klensin <span dir="ltr">&lt;<a href="mailto:klensin@jck.com">klensin@jck.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Mark,<br>

<br>

Several comments inline...<br>

<br>

--On Sunday, June 28, 2009 21:28 -0700 Mark Davis ⌛<br>

<div class="im">&lt;<a href="mailto:mark@macchiato.com">mark@macchiato.com</a>&gt; wrote:<br>

<br>

&gt; Returning to the discussion, now that some of my other<br>

&gt; standards work is under control (RFC4646bis was approved,<br>

&gt; whew!)<br>

</div>&gt;...<br>

<div class="im"><br>

&gt; Now, my position is still that the simplest and most<br>

&gt; compatible option open to us is to simply map with NFKC +<br>

&gt; Casefold.<br>

<br>

</div>I continue to believe that CaseFold is a showstopper.  When its<br>

results are not identical to those produced by LowerCase, it<br>

produces results that are astonishing to some users and leads us<br>

into the &quot;is that a separate character or not&quot; trap that we&#39;ve<br>

seen manifested at least twice.  I note that TUS recommends<br>

against its use for mapping (as distinct from comparison) and<br>

appears to do so for just the reason that it involves too much<br>

information loss.</blockquote><div><br>You need to provide actual data behind this. Please list exactly the characters that you mean, and why you think they are problematic. Note also that the formulation that I gave means that any character that is PVALID would automatically be excluded, eg if final-sigma is PVALID then it is unaffected. And we can certainly introduce other exceptions.<br>

<br>And I know full well about the issues in TUS, having written or participated in the writing of them.<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


&gt;...<br>

<div class="im">&gt; Proposal: A. Tables document<br>

&gt;<br>

&gt; Add a new type of character: REMAP. A character is REMAP if it<br>

&gt; meets *all of * the following criteria:<br>

&gt;<br>

</div>&gt;    1. The character is not PVALID or CONTEXTO<br>

&gt;    2. If remapped by the Unicode property NFKC_Casefold*, then<br>

<div class="im">&gt; the resulting    character(s) are all PVALID or CONTEXTO<br>

</div>&gt;    3. The character is a LetterDigit or Pd<br>

&gt;    4. The character has one of the following<br>

<div class="im">&gt; Decomposition_Type values: initial, medial, final,<br>

&gt; isolated, wide, narrow, or compat<br>

<br>

</div>I am very concerned that collapsing initial, medial, and final<br>

together may get us into problems with other language<br>

communities similar to those we have gotten into with Final<br>

Sigma, especially when those communities denote word boundaries<br>

by the appearance of final or initial forms and hence would use<br>

those forms in a style similar to the way &quot;BigCompany&quot; or<br>

&quot;big-company&quot; might be used in ASCII.</blockquote><div><br>The mechanism used to indicate boundaries is not, as you think, the use of the presentation forms; it is the use of the ZWNJ/J, which we already provide for.<br>

<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

<br>

As I&#39;ve said several times before, even if we disallow the<br>

NFKC-affected forms those characters, if a need arises, we can<br>

(painfully) redefine them as PVALID and allow them.  But, if we<br>

map them to something else, we lose all information about what<br>

was intended/desired and end up in precisely the mess we have<br>

with e.g., Final Sigma  and ZWJ/ZWNJ in which &quot;the right thing<br>

to do&quot; poses enough compatibility problems to cause opposition<br>

to making changes.</blockquote><div><br>You make it sounds like final sigma, ZWJ/NJ, eszett and the other cases under discussion were oversights in the process of developing the current IDNA. That wasn&#39;t the case; these were deliberate choices made at the time. A case mapping is also a &#39;loss of information&#39;, but one that people clearly want.<br>

<br>If you have any particular characters that you think would be of concern, you should raise them as issues.<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

<br>

&gt;    5. The character does not have the Script value: Hangul<br>

<div class="im">&gt;<br>

&gt; The REMAP characters are removed from DISALLOWED, so that the<br>

&gt; TABLES values form a partition (all the values are disjoint).<br>

<br>

</div>This strikes me as dangerous -- see below.<br>

<br>

&gt; B. Protocols documentChange sections 4.2.1 and 5.3 so as to<br>

&gt; require:<br>

&gt;<br>

&gt;    1. Mapping all REMAP characters according to the Unicode<br>

&gt; property    NFKC_Casefold,<br>

&gt;    2. Then normalizing the result according to NFC.<br>

<br>

Making this change to 4.2.1 eliminates the requirement that the<br>

registrant understand _exactly_ what is being registered, i.e.,<br>

that the communication path between the registrant and registry<br>

occur only using U-labels and/or A-labels.  My understanding was<br>

that we had reached one of the more clear consensus we had in<br>

these discussions that the &quot;no mapping on registration&quot;<br>

restriction was appropriate.  Are you proposing to reopen that<br>

question?</blockquote><div><br>Sorry, you are correct. This would only affect the lookup part.<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

<div class="im"><br>

&gt; The rest of the tests for U-Label remain unchanged.<br>

<br>

</div>I believe that doing this by the type of change to Tables that<br>

you recommend either requires a change to the way that the<br>

definition of U-label is stated or requires us to abandon the<br>

very clear concept of a U-label that is completely symmetric,<br>

with no information loss in either direction, with an A-label.</blockquote><div><br>I don&#39;t see why you would think that.  A U-Label remains just the way it is, and has a 1-1 relation with an A-Label. The only difference is that we have an additional category of M-Label; one that is not a U-Label but maps to one.<br>

<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

<br>

There is also a subtle interaction with Section 5.5: if the<br>

mapping is performed by the time Section 5.3 concludes (or,<br>

under special circumstances, not applied at all), then Section<br>

5.5 must also prohibit REMAP.</blockquote><div><br>You are correct; that was my intention, but I forgot to mention it. Yes, there needs to be a change in 5.5.<br><br>So below:<br><pre class="newpage">   o  Labels containing prohibited code points, i.e., those that are<br>

      assigned to the &quot;DISALLOWED&quot; category in the permitted character<br>      table [<a href="http://tools.ietf.org/html/draft-ietf-idnabis-protocol-12#ref-IDNA2008-Tables" title="&quot;The Unicode Codepoints and IDNA&quot;">IDNA2008-Tables</a>].<br>

</pre> add<br><pre class="newpage">   o  Labels containing remapped code points, i.e., those that are<br>      assigned to the &quot;REMAP&quot; category in the permitted character<br>      table [<a href="http://tools.ietf.org/html/draft-ietf-idnabis-protocol-12#ref-IDNA2008-Tables" title="&quot;The Unicode Codepoints and IDNA&quot;">IDNA2008-Tables</a>].<br>

</pre><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

<br>

&gt; C. Defs document<br>

&gt;<br>

&gt;    1. Define REMAP<br>

&gt;    2. Define an M-Label to be one which if remapped according<br>

<div class="im">&gt; to B1+B2,    results in a U-Label.<br>

<br>

</div>The idea of an M-Label still makes me uncomfortable.  Again, we<br>

have had that discussion before.<br>

<br>

regards,<br>

<font color="#888888">   john<br>

</font><div><div></div><div class="h5"><br>

<br>

<br>

<br>

_______________________________________________<br>

Idna-update mailing list<br>

<a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>

<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>

</div></div></blockquote></div><br>