New version, draft-faltstrom-idnabis-tables-02.txt, available

Patrik Fältström patrik at
Fri Jun 8 09:40:33 CEST 2007

On 8 jun 2007, at 02.29, Kenneth Whistler wrote:

>> I want you to specifically comment on the definitions of the rules
>> used to select codepoints in section 2,
> Section 2.2, Rule B - Normalization, contains some incorrect
> statements about the data files. In particular:
> "Normalization rules are found in UnicodeData.txt in the sixth  
> column."
> That is not true. Those are not rules. What is in the sixth
> column is the decomposition *mapping* for a character (if
> not trivially mapped to itself). And that mapping is then
> used in the set of rules in UAX #15 that define the various
> normalization forms.

So if I change the text to "Decomposition mapping is found in the  
sixth column in UnicodeData.txt that together with UAX #15 define the  
various normalization forms", that is correct?

> So the following statement is also not true:
> "The data (sixth column) include both the normalization and, ..."


> The decomposition mapping for a character is not necessarily
> the same as its "normalization", for any of the 4 defined
> normalization forms. That is because normalization is
> defined in terms of the recursive application of all
> decomposition mappings.
> A similar confusion is in the wording of the paragraph
> "... while the normalized data is U+006C U+00B7 ..."
> Actually, <U+006C, U+00B7> is the (compatibility) decomposition  
> mapping.
> varies, depending on the normalization form chosen.

Ok, I should clarify with the specific normalization form used.

> Section 2.3. Rule C - Casefolding.
> The data that specificies if a rule in SpecialCasing.txt
> is conditional is actually in the *5th* column of SpecialCasing.txt,
> not the *6th* column.

Oh, that was a typo/bug. Thanks.

> Section 2.4. Rule D - Ignorables
> For the statement of the derivation of Default_Ignorable_Code_Point,
> please see the erratum of 2007-January-25 posted at:
> The correct statement is:
> Other_Default_Ignorable_Code_Point + Cf + Cc + Cs
> + Noncharacter_Code_Point + Variation_Selector - White_Space
> - FFF9..FFFB (Annotation Characters)

I picked the text from DerivedCoreProperties.txt (as you might  

I will correct this.

> (Varation_Selector was inadvertantly left out of the description
> of the derivation, although it is correctly part of the actual
> derivation of the list in the data file.)
> And the statement "Noncharacters is a property only existing in the
> NamesList.txt." is unnecessary. The normative listing of the
> Noncharacter_Code_Point property is in PropList.txt. And that
> is the property used for derivation.

Hmmm...according to my reading of the two files, U+FEFF is a  
Noncharacter_Code_Point according to NamesList.txt, but not in  
PropList.txt. This is why I refer to NamesList.txt.

>> and the algorithm of how to
>> calculate the value of the derived property in section 3.
> I'll provide feedback on that separately. I don't think the
> specification of Rule H (distinguishing Latin, Greek, and
> Cyrillic as "Stable scripts" in contradistinction to all
> other scripts) makes sense -- and so the algorithm, which
> makes distinctions based very prominently on Rule H, is,
> in my opinion, overly complex and unclear.
>> On top of that of course also a comparison of your results when doing
>> the same calculations with the result I got with my code in  
>> section 4.
> I can, of course, grab the I-D text and do a bunch of editing,
> in an attempt to turn the listing in Section 4.1 into
> something that is machine-readable. But it would be really
> helpful for these kinds of evaluations if the machine readable
> form of these tables were simply posted in a specified location
> for use in comparisons of results. That would avoid the
> extra work of repeated manual editing to extract, as well
> as avoiding the probably introduction of extraneous errors
> just from the manual editing.

I will make sure the machine readable list is available online as  
well as in the draft.

For now, see

That is the same table as you find in the draft.


More information about the Idna-update mailing list