Patrik,<br><br>Here are the references you can use. Some of the web pages
won't be live yet; they will by the end of March. They are permanent
links, once they go live.<br>
<ul><li>toNFC and toNKDC (and isNFC, isNFKC) are defined in <i>Section 2 Notation</i> of <i>Unicode Standard Annex #15: Unicode Normalization Forms</i> by
Mark Davis and Martin Dürst, an integral part of The Unicode Standard,
Version 5.1.0. (<a href="http://www.unicode.org/reports/tr15/tr15-29.html" target="_blank">http://www.unicode.org/reports/tr15/tr15-29.html</a>)</li><li>toCaseFold is defined in <i>Section 3.13 Default Case Algorithms</i> of The Unicode Standard,
Version 5.1.0.</li></ul>The reference for Unicode 5.1.0 is:<br><ul><li>The Unicode Consortium. The Unicode Standard,
Version 5.1.0, defined by: <i>The Unicode Standard, Version 5.0 </i>(Boston, MA, Addison-Wesley, 2007. ISBN
                                0-321-48091-0) (<a href="http://www.unicode.org/versions/Unicode5.0.0/" target="_blank">http://www.unicode.org/versions/Unicode5.0.0/</a>), as
amended by <i>Unicode 5.1.0</i> (<a href="http://www.unicode.org/versions/Unicode5.1.0/" target="_blank">http://www.unicode.org/versions/Unicode5.1.0/</a>).</li></ul>Note:
We've been planning for 5.1 anyway (release in March), and for references it is important, since it has clarifying text for toCaseFold,
and a number of other areas that should be referenced.<br>
<br>Mark<br><br><div class="gmail_quote">On Sat, Feb 9, 2008 at 3:49 AM, Patrik Fältström <<a href="mailto:patrik@frobbit.se">patrik@frobbit.se</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
All good comments Erik. Mark, I need to hear from you on the Unicode<br>
view on this. I have no problems changing according to what Erik<br>
suggests, as long as I get the "correct" names from you.<br>
<br>
Patrik<br>
<div><div></div><div class="Wj3C7c"><br>
On 9 feb 2008, at 03.32, Erik van der Poel wrote:<br>
<br>
> Patrik and Mark,<br>
><br>
> I'm reading tables-04 now. I noticed a few things that could be<br>
> improved, in terms of standard, stable and unambiguous references to<br>
> Unicode. This is important since IDNA200X is supposed to evolve with<br>
> Unicode. We need to be able to generate the pvalid/disallowed/etc<br>
> table every time Unicode releases a new version. So here are a few<br>
> suggestions and questions:<br>
><br>
> Standard. IDNA200X should use the standard names of Unicode properties<br>
> and processes, and Unicode should try not to change those names. For<br>
> example, tables-04 refers to NFKC(...) while Unicode calls that<br>
> toNFKC(...):<br>
><br>
> <a href="http://www.unicode.org/reports/tr15/#Notation" target="_blank">http://www.unicode.org/reports/tr15/#Notation</a><br>
><br>
> There is another function called isNFKC(...), so it would be nice to<br>
> get the right one (toNFKC).<br>
><br>
> Stable. IDNA200X should use stable references to Unicode documents,<br>
> and Unicode should make sure those references keep working. For<br>
> example, the normalization spec mentioned above could be referenced<br>
> using the stable URI:<br>
><br>
> <a href="http://www.unicode.org/reports/tr15/" target="_blank">http://www.unicode.org/reports/tr15/</a><br>
><br>
> Unambiguous. IDNA200X should use unambiguous names, and Unicode should<br>
> offer them. For example, tables-04 refers to casefold(...). Unicode<br>
> has something called Case_Folding(c) that only applies to single<br>
> characters:<br>
><br>
> <a href="http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf" target="_blank">http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf</a><br>
><br>
> Unicode also has something called toCasefolding(x) for strings of<br>
> characters on page 125 of the above chapter, labelled R4. However, the<br>
> paragraph above that says that there is a simple and a full variant of<br>
> that. IDNA200X needs the string function (not the single character<br>
> function) in the "NFKC(casefold(NFKC(cp)) != cp" construct. I believe<br>
> IDNA200X also needs the full variant, not the simple variant. But<br>
> Unicode does not appear to have an unambiguous name for the full<br>
> variant of the function that works on strings. (Or, if R4 *is* the<br>
> full variant, the paragraph above it needs tweaking.) In the meantime,<br>
> IDNA200X can disambiguate it by explicitly saying that<br>
> toCasefolding(...) refers to the full variant.<br>
><br>
> Yes, this is just nit-picking, but at least we have gotten to the<br>
> point where we're just tweaking the IDNA200X drafts! We're nearly<br>
> done. :-)<br>
><br>
> Erik<br>
</div></div>> _______________________________________________<br>
> Idna-update mailing list<br>
> <a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>
> <a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
<br>
<br>_______________________________________________<br>
Idna-update mailing list<br>
<a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>
<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br>Mark