New version, draft-faltstrom-idnabis-tables-02.txt, available
patrik at frobbit.se
Fri Jun 8 09:40:33 CEST 2007
On 8 jun 2007, at 02.29, Kenneth Whistler wrote:
>> I want you to specifically comment on the definitions of the rules
>> used to select codepoints in section 2,
> Section 2.2, Rule B - Normalization, contains some incorrect
> statements about the data files. In particular:
> "Normalization rules are found in UnicodeData.txt in the sixth
> That is not true. Those are not rules. What is in the sixth
> column is the decomposition *mapping* for a character (if
> not trivially mapped to itself). And that mapping is then
> used in the set of rules in UAX #15 that define the various
> normalization forms.
So if I change the text to "Decomposition mapping is found in the
sixth column in UnicodeData.txt that together with UAX #15 define the
various normalization forms", that is correct?
> So the following statement is also not true:
> "The data (sixth column) include both the normalization and, ..."
> The decomposition mapping for a character is not necessarily
> the same as its "normalization", for any of the 4 defined
> normalization forms. That is because normalization is
> defined in terms of the recursive application of all
> decomposition mappings.
> A similar confusion is in the wording of the paragraph
> discussing LATIN SMALL LETTER L WITH MIDDLE DOT:
> "... while the normalized data is U+006C U+00B7 ..."
> Actually, <U+006C, U+00B7> is the (compatibility) decomposition
> The normalized form of LATIN SMALL LETTER L WITH MIDDLE DOT
> varies, depending on the normalization form chosen.
Ok, I should clarify with the specific normalization form used.
> Section 2.3. Rule C - Casefolding.
> The data that specificies if a rule in SpecialCasing.txt
> is conditional is actually in the *5th* column of SpecialCasing.txt,
> not the *6th* column.
Oh, that was a typo/bug. Thanks.
> Section 2.4. Rule D - Ignorables
> For the statement of the derivation of Default_Ignorable_Code_Point,
> please see the erratum of 2007-January-25 posted at:
> The correct statement is:
> Other_Default_Ignorable_Code_Point + Cf + Cc + Cs
> + Noncharacter_Code_Point + Variation_Selector - White_Space
> - FFF9..FFFB (Annotation Characters)
I picked the text from DerivedCoreProperties.txt (as you might
I will correct this.
> (Varation_Selector was inadvertantly left out of the description
> of the derivation, although it is correctly part of the actual
> derivation of the list in the data file.)
> And the statement "Noncharacters is a property only existing in the
> NamesList.txt." is unnecessary. The normative listing of the
> Noncharacter_Code_Point property is in PropList.txt. And that
> is the property used for derivation.
Hmmm...according to my reading of the two files, U+FEFF is a
Noncharacter_Code_Point according to NamesList.txt, but not in
PropList.txt. This is why I refer to NamesList.txt.
>> and the algorithm of how to
>> calculate the value of the derived property in section 3.
> I'll provide feedback on that separately. I don't think the
> specification of Rule H (distinguishing Latin, Greek, and
> Cyrillic as "Stable scripts" in contradistinction to all
> other scripts) makes sense -- and so the algorithm, which
> makes distinctions based very prominently on Rule H, is,
> in my opinion, overly complex and unclear.
>> On top of that of course also a comparison of your results when doing
>> the same calculations with the result I got with my code in
>> section 4.
> I can, of course, grab the I-D text and do a bunch of editing,
> in an attempt to turn the listing in Section 4.1 into
> something that is machine-readable. But it would be really
> helpful for these kinds of evaluations if the machine readable
> form of these tables were simply posted in a specified location
> for use in comparisons of results. That would avoid the
> extra work of repeated manual editing to extract, as well
> as avoiding the probably introduction of extraneous errors
> just from the manual editing.
I will make sure the machine readable list is available online as
well as in the draft.
For now, see http://stupid.domain.name/idnabis/idnabis-070608.txt
That is the same table as you find in the draft.
More information about the Idna-update