standard, stable and unambiguous references Unicode

Mark Davis mark.davis at icu-project.org
Sun Feb 17 00:15:14 CET 2008


Will do.

On Feb 16, 2008 2:05 PM, Patrik Fältström <patrik at frobbit.se> wrote:

>
> On 15 feb 2008, at 03.45, Mark Davis wrote:
>
> > Patrik,
> >
> > Here are the references you can use. Some of the web pages won't be
> > live
> > yet; they will by the end of March. They are permanent links, once
> > they go
> > live.
>
> Thanks. Can you let me know when these are live?
>
>    Patrik
>
> >
> >
> >   - toNFC and toNKDC (and isNFC, isNFKC) are defined in *Section 2
> >   Notation* of *Unicode Standard Annex #15: Unicode Normalization
> > Forms*by Mark Davis and Martin Dürst, an integral part of The Unicode
> > Standard,
> >   Version 5.1.0. (http://www.unicode.org/reports/tr15/tr15-29.html)
> >   - toCaseFold is defined in *Section 3.13 Default Case Algorithms* of
> >   The Unicode Standard, Version 5.1.0.
> >
> > The reference for Unicode 5.1.0 is:
> >
> >   - The Unicode Consortium. The Unicode Standard, Version 5.1.0,
> > defined
> >   by: *The Unicode Standard, Version 5.0 *(Boston, MA, Addison-Wesley,
> >   2007. ISBN 0-321-48091-0) (
> >   http://www.unicode.org/versions/Unicode5.0.0/), as amended by
> > *Unicode
> >   5.1.0* (http://www.unicode.org/versions/Unicode5.1.0/).
> >
> > Note: We've been planning for 5.1 anyway (release in March), and for
> > references it is important, since it has clarifying text for
> > toCaseFold, and
> > a number of other areas that should be referenced.
> >
> > Mark
> >
> > On Sat, Feb 9, 2008 at 3:49 AM, Patrik Fältström <patrik at frobbit.se>
> > wrote:
> >
> >> All good comments Erik. Mark, I need to hear from you on the Unicode
> >> view on this. I have no problems changing according to what Erik
> >> suggests, as long as I get the "correct" names from you.
> >>
> >>   Patrik
> >>
> >> On 9 feb 2008, at 03.32, Erik van der Poel wrote:
> >>
> >>> Patrik and Mark,
> >>>
> >>> I'm reading tables-04 now. I noticed a few things that could be
> >>> improved, in terms of standard, stable and unambiguous references to
> >>> Unicode. This is important since IDNA200X is supposed to evolve with
> >>> Unicode. We need to be able to generate the pvalid/disallowed/etc
> >>> table every time Unicode releases a new version. So here are a few
> >>> suggestions and questions:
> >>>
> >>> Standard. IDNA200X should use the standard names of Unicode
> >>> properties
> >>> and processes, and Unicode should try not to change those names. For
> >>> example, tables-04 refers to NFKC(...) while Unicode calls that
> >>> toNFKC(...):
> >>>
> >>> http://www.unicode.org/reports/tr15/#Notation
> >>>
> >>> There is another function called isNFKC(...), so it would be nice to
> >>> get the right one (toNFKC).
> >>>
> >>> Stable. IDNA200X should use stable references to Unicode documents,
> >>> and Unicode should make sure those references keep working. For
> >>> example, the normalization spec mentioned above could be referenced
> >>> using the stable URI:
> >>>
> >>> http://www.unicode.org/reports/tr15/
> >>>
> >>> Unambiguous. IDNA200X should use unambiguous names, and Unicode
> >>> should
> >>> offer them. For example, tables-04 refers to casefold(...). Unicode
> >>> has something called Case_Folding(c) that only applies to single
> >>> characters:
> >>>
> >>> http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf
> >>>
> >>> Unicode also has something called toCasefolding(x) for strings of
> >>> characters on page 125 of the above chapter, labelled R4. However,
> >>> the
> >>> paragraph above that says that there is a simple and a full
> >>> variant of
> >>> that. IDNA200X needs the string function (not the single character
> >>> function) in the "NFKC(casefold(NFKC(cp)) != cp" construct. I
> >>> believe
> >>> IDNA200X also needs the full variant, not the simple variant. But
> >>> Unicode does not appear to have an unambiguous name for the full
> >>> variant of the function that works on strings. (Or, if R4 *is* the
> >>> full variant, the paragraph above it needs tweaking.) In the
> >>> meantime,
> >>> IDNA200X can disambiguate it by explicitly saying that
> >>> toCasefolding(...) refers to the full variant.
> >>>
> >>> Yes, this is just nit-picking, but at least we have gotten to the
> >>> point where we're just tweaking the IDNA200X drafts! We're nearly
> >>> done. :-)
> >>>
> >>> Erik
> >>> _______________________________________________
> >>> Idna-update mailing list
> >>> Idna-update at alvestrand.no
> >>> http://www.alvestrand.no/mailman/listinfo/idna-update
> >>
> >>
> >> _______________________________________________
> >> Idna-update mailing list
> >> Idna-update at alvestrand.no
> >> http://www.alvestrand.no/mailman/listinfo/idna-update
> >>
> >>
> >
> >
> > --
> > Mark
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>


-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080216/e8cebaee/attachment.html


More information about the Idna-update mailing list