New version, draft-faltstrom-idnabis-tables-02.txt, available

Patrik Fältström patrik at frobbit.se
Fri Jun 8 09:40:33 CEST 2007


On 8 jun 2007, at 02.29, Kenneth Whistler wrote:

>> I want you to specifically comment on the definitions of the rules
>> used to select codepoints in section 2,
>
> Section 2.2, Rule B - Normalization, contains some incorrect
> statements about the data files. In particular:
>
> "Normalization rules are found in UnicodeData.txt in the sixth  
> column."
>
> That is not true. Those are not rules. What is in the sixth
> column is the decomposition *mapping* for a character (if
> not trivially mapped to itself). And that mapping is then
> used in the set of rules in UAX #15 that define the various
> normalization forms.

So if I change the text to "Decomposition mapping is found in the  
sixth column in UnicodeData.txt that together with UAX #15 define the  
various normalization forms", that is correct?

> So the following statement is also not true:
>
> "The data (sixth column) include both the normalization and, ..."

Understood.

> The decomposition mapping for a character is not necessarily
> the same as its "normalization", for any of the 4 defined
> normalization forms. That is because normalization is
> defined in terms of the recursive application of all
> decomposition mappings.
>
> A similar confusion is in the wording of the paragraph
> discussing LATIN SMALL LETTER L WITH MIDDLE DOT:
>
> "... while the normalized data is U+006C U+00B7 ..."
>
> Actually, <U+006C, U+00B7> is the (compatibility) decomposition  
> mapping.
> The normalized form of LATIN SMALL LETTER L WITH MIDDLE DOT
> varies, depending on the normalization form chosen.

Ok, I should clarify with the specific normalization form used.

> Section 2.3. Rule C - Casefolding.
>
> The data that specificies if a rule in SpecialCasing.txt
> is conditional is actually in the *5th* column of SpecialCasing.txt,
> not the *6th* column.

Oh, that was a typo/bug. Thanks.

> Section 2.4. Rule D - Ignorables
>
> For the statement of the derivation of Default_Ignorable_Code_Point,
> please see the erratum of 2007-January-25 posted at:
>
> http://www.unicode.org/errata/
>
> The correct statement is:
>
> Other_Default_Ignorable_Code_Point + Cf + Cc + Cs
> + Noncharacter_Code_Point + Variation_Selector - White_Space
> - FFF9..FFFB (Annotation Characters)

I picked the text from DerivedCoreProperties.txt (as you might  
understand).

I will correct this.

> (Varation_Selector was inadvertantly left out of the description
> of the derivation, although it is correctly part of the actual
> derivation of the list in the data file.)
>
> And the statement "Noncharacters is a property only existing in the
> NamesList.txt." is unnecessary. The normative listing of the
> Noncharacter_Code_Point property is in PropList.txt. And that
> is the property used for derivation.

Hmmm...according to my reading of the two files, U+FEFF is a  
Noncharacter_Code_Point according to NamesList.txt, but not in  
PropList.txt. This is why I refer to NamesList.txt.

>> and the algorithm of how to
>> calculate the value of the derived property in section 3.
>
> I'll provide feedback on that separately. I don't think the
> specification of Rule H (distinguishing Latin, Greek, and
> Cyrillic as "Stable scripts" in contradistinction to all
> other scripts) makes sense -- and so the algorithm, which
> makes distinctions based very prominently on Rule H, is,
> in my opinion, overly complex and unclear.
>
>> On top of that of course also a comparison of your results when doing
>> the same calculations with the result I got with my code in  
>> section 4.
>
> I can, of course, grab the I-D text and do a bunch of editing,
> in an attempt to turn the listing in Section 4.1 into
> something that is machine-readable. But it would be really
> helpful for these kinds of evaluations if the machine readable
> form of these tables were simply posted in a specified location
> for use in comparisons of results. That would avoid the
> extra work of repeated manual editing to extract, as well
> as avoiding the probably introduction of extraneous errors
> just from the manual editing.

I will make sure the machine readable list is available online as  
well as in the draft.

For now, see http://stupid.domain.name/idnabis/idnabis-070608.txt

That is the same table as you find in the draft.

    Patrik



More information about the Idna-update mailing list