standard, stable and unambiguous references Unicode

Patrik Fältström patrik at frobbit.se
Sat Feb 16 23:05:34 CET 2008


On 15 feb 2008, at 03.45, Mark Davis wrote:

> Patrik,
>
> Here are the references you can use. Some of the web pages won't be  
> live
> yet; they will by the end of March. They are permanent links, once  
> they go
> live.

Thanks. Can you let me know when these are live?

    Patrik

>
>
>   - toNFC and toNKDC (and isNFC, isNFKC) are defined in *Section 2
>   Notation* of *Unicode Standard Annex #15: Unicode Normalization
> Forms*by Mark Davis and Martin Dürst, an integral part of The Unicode
> Standard,
>   Version 5.1.0. (http://www.unicode.org/reports/tr15/tr15-29.html)
>   - toCaseFold is defined in *Section 3.13 Default Case Algorithms* of
>   The Unicode Standard, Version 5.1.0.
>
> The reference for Unicode 5.1.0 is:
>
>   - The Unicode Consortium. The Unicode Standard, Version 5.1.0,  
> defined
>   by: *The Unicode Standard, Version 5.0 *(Boston, MA, Addison-Wesley,
>   2007. ISBN 0-321-48091-0) (
>   http://www.unicode.org/versions/Unicode5.0.0/), as amended by  
> *Unicode
>   5.1.0* (http://www.unicode.org/versions/Unicode5.1.0/).
>
> Note: We've been planning for 5.1 anyway (release in March), and for
> references it is important, since it has clarifying text for  
> toCaseFold, and
> a number of other areas that should be referenced.
>
> Mark
>
> On Sat, Feb 9, 2008 at 3:49 AM, Patrik Fältström <patrik at frobbit.se>  
> wrote:
>
>> All good comments Erik. Mark, I need to hear from you on the Unicode
>> view on this. I have no problems changing according to what Erik
>> suggests, as long as I get the "correct" names from you.
>>
>>   Patrik
>>
>> On 9 feb 2008, at 03.32, Erik van der Poel wrote:
>>
>>> Patrik and Mark,
>>>
>>> I'm reading tables-04 now. I noticed a few things that could be
>>> improved, in terms of standard, stable and unambiguous references to
>>> Unicode. This is important since IDNA200X is supposed to evolve with
>>> Unicode. We need to be able to generate the pvalid/disallowed/etc
>>> table every time Unicode releases a new version. So here are a few
>>> suggestions and questions:
>>>
>>> Standard. IDNA200X should use the standard names of Unicode  
>>> properties
>>> and processes, and Unicode should try not to change those names. For
>>> example, tables-04 refers to NFKC(...) while Unicode calls that
>>> toNFKC(...):
>>>
>>> http://www.unicode.org/reports/tr15/#Notation
>>>
>>> There is another function called isNFKC(...), so it would be nice to
>>> get the right one (toNFKC).
>>>
>>> Stable. IDNA200X should use stable references to Unicode documents,
>>> and Unicode should make sure those references keep working. For
>>> example, the normalization spec mentioned above could be referenced
>>> using the stable URI:
>>>
>>> http://www.unicode.org/reports/tr15/
>>>
>>> Unambiguous. IDNA200X should use unambiguous names, and Unicode  
>>> should
>>> offer them. For example, tables-04 refers to casefold(...). Unicode
>>> has something called Case_Folding(c) that only applies to single
>>> characters:
>>>
>>> http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf
>>>
>>> Unicode also has something called toCasefolding(x) for strings of
>>> characters on page 125 of the above chapter, labelled R4. However,  
>>> the
>>> paragraph above that says that there is a simple and a full  
>>> variant of
>>> that. IDNA200X needs the string function (not the single character
>>> function) in the "NFKC(casefold(NFKC(cp)) != cp" construct. I  
>>> believe
>>> IDNA200X also needs the full variant, not the simple variant. But
>>> Unicode does not appear to have an unambiguous name for the full
>>> variant of the function that works on strings. (Or, if R4 *is* the
>>> full variant, the paragraph above it needs tweaking.) In the  
>>> meantime,
>>> IDNA200X can disambiguate it by explicitly saying that
>>> toCasefolding(...) refers to the full variant.
>>>
>>> Yes, this is just nit-picking, but at least we have gotten to the
>>> point where we're just tweaking the IDNA200X drafts! We're nearly
>>> done. :-)
>>>
>>> Erik
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
>
>
> -- 
> Mark

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20080216/c52af39e/PGP.bin


More information about the Idna-update mailing list