Jefsey went to hospital urgencies. He as limited internet access. He may be hampered for a few weeks. He says that we should stop using "Unicode strings" when we talk of "U-labels". This is extermely confusing, because it mixes end to end and fringe to fringe terminology.<br>


<br>Users may enter the Unicode strings/or whatever they want, It belongs to the application or to their UI or to the ML-DNS to transform their entry into U-labels. Also, ML-DNS is not limited to the Internet naming system and we will meet many applications with different naming rules and hence conversion preparations to U-labels. Many will be based upon ISO 10646 or entirely different standards/technologies. We will therefore adapt from a user point of view, as exemplified in RFC 5895. However, we will strive to respect the IAB positions to come on IDNA as far as the inside Internet is considered (end to end).<br>


<br>This is why we want to stick to the "U-label" for "User-label" terms as what is received by the Internet from the user, and "Unicode/ISO 10646 string" or "User-entry" for what the user enters. This is something which at this stage does not create problem 'if RFC 5892 has no bug" as previously discussed.<br>

<br>Please remember that ML-DNS is fringe to fringe and will accept semiotic entries: working examples today: kinect entries or audio entries for fingers snap or audionames. The way this works is asymetric, based upon IDNA2008. Audio entry is converted in pseudo-U-label which is used by the local fringe as per IDNA2008: conversion to A-label, resolution, conversion to the pseudo-U-label on the other end for usual processing on the other fringe.<br>

<br>Our IUse emerging community working position is expressed in the introductory note at <a href="http://incsa.org" target="_blank">http://incsa.org</a>. Comments welcome.<br><br>I hope my rendition of Jefsey's input is understandable.<br>


<br>Portzamparc<br><br><br><br><div class="gmail_quote">2011/1/5 Kenneth Whistler <span dir="ltr"><<a href="mailto:kenw@sybase.com" target="_blank">kenw@sybase.com</a>></span><br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


Simon asked:<br>

<div><br>

> I need a clarification regarding this paragraph in section 4.2.3.2 of<br>

> RFC 5891:<br>

><br>

>    The Unicode string MUST NOT begin with a combining mark or combining<br>

>    character (see The Unicode Standard, Section 2.11 [Unicode] for an<br>

>    exact definition).<br>

<br>

</div>Mark Davis suggested that this would better read:<br>

<div><br>

     The Unicode string MUST NOT begin with a character having a General<br>

     Category property value of Mark (M).<br>

<br>

</div>and I concur that that would be more precise.<br>

<br>

And to add to Mark Davis' clarification and respond further<br>

to one of Simon's questions:<br>

<div><br>

> There is one section 3.6 on "Combination" that gives the precice<br>

> definition of a "Combining character":<br>

><br>

>    Combining character: A character with the General Category of<br>

>    Combining Mark (M).<br>

<br>

</div><div>> 3) What is the precice definition of a "combining mark"?<br>

<br>

</div>In the Unicode Standard, "combining character" is the term<br>

of art. That is the general term which is used throughout<br>

the standard for referring to characters which combine.<br>

Hence the normative definition D52 for "Combining character"<br>

which Simon quotes from Section 3.6 of Unicode 5.0.<br>

<br>

"Mark" is a property value alias which refers to any of the<br>

three possible General Category values for a combining character<br>

in the standard. This is defined by the following entries in<br>

the UCD data file, PropertyValueAliases.txt:<br>

<br>

gc ; M         ; Mark                             # Mc | Me | Mn<br>

gc ; Mc        ; Spacing_Mark<br>

gc ; Me        ; Enclosing_Mark<br>

gc ; Mn        ; Nonspacing_Mark<br>

<br>

"Combining mark" is an unofficial synonym for "combining character".<br>

It occurs occasionally in Unicode-related documents, including<br>

the text of the standard itself, because Unicode implementers<br>

often talk about "spacing marks" and "nonspacing marks" and<br>

"enclosing marks" and then treat the union of all those as<br>

"combining marks" by force of habit in talking about "marks".<br>

<br>

My advice for external standards referring to the Unicode<br>

Standard would be to stick to "combining character", which is<br>

and will remain the term with the normative definition in<br>

the Unicode Standard. And the best point of reference is<br>

to Section 3.6, "Combination", which is where this term (and<br>

related terms) have their normative definitions in the standard.<br>

<br>

--Ken<br>

<div><div></div><div><br>

_______________________________________________<br>

Idna-update mailing list<br>

<a href="mailto:Idna-update@alvestrand.no" target="_blank">Idna-update@alvestrand.no</a><br>

<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>

</div></div></blockquote></div><br><div></div>

<div style="visibility: hidden; left: -5000px;" id="avg_ls_inline_popup"></div><style type="text/css">#avg_ls_inline_popup{position: absolute;z-index: 9999;padding: 0px 0px;margin-left: 0px;margin-top: 0px;overflow: hidden;word-wrap: break-word;color: black;font-size: 10px;text-align: left;line-height: 130%;}</style>