New version, draft-faltstrom-idnabis-tables-02.txt, available
Kenneth Whistler
kenw at sybase.com
Mon Jun 11 20:29:25 CEST 2007
> On 8 jun 2007, at 02.29, Kenneth Whistler wrote:
> > ... What is in the sixth
> > column is the decomposition *mapping* for a character (if
> > not trivially mapped to itself). And that mapping is then
> > used in the set of rules in UAX #15 that define the various
> > normalization forms.
>
> So if I change the text to "Decomposition mapping is found in the
> sixth column in UnicodeData.txt that together with UAX #15 define the
> various normalization forms", that is correct?
Correct.
> > Section 2.4. Rule D - Ignorables
> >
> > For the statement of the derivation of Default_Ignorable_Code_Point,
> > please see the erratum of 2007-January-25 posted at:
> >
> > http://www.unicode.org/errata/
> >
> > The correct statement is:
> >
> > Other_Default_Ignorable_Code_Point + Cf + Cc + Cs
> > + Noncharacter_Code_Point + Variation_Selector - White_Space
> > - FFF9..FFFB (Annotation Characters)
>
> I picked the text from DerivedCoreProperties.txt (as you might
> understand).
>
> I will correct this.
>
> > (Varation_Selector was inadvertantly left out of the description
> > of the derivation, although it is correctly part of the actual
> > derivation of the list in the data file.)
> >
> > And the statement "Noncharacters is a property only existing in the
> > NamesList.txt." is unnecessary. The normative listing of the
> > Noncharacter_Code_Point property is in PropList.txt. And that
> > is the property used for derivation.
>
> Hmmm...according to my reading of the two files, U+FEFF is a
> Noncharacter_Code_Point according to NamesList.txt, but not in
> PropList.txt. This is why I refer to NamesList.txt.
No. U+FEFF is *not* a Noncharacter_Code_Point. U+FFFE (the
byte-swapped version of a BOM, if used as a byte-order sentinel)
*is* a Noncharacter_Code_Point.
U+FEFF is valid for interchange in text. U+FFFE is not.
U+FEFF has a general category: gc=Cf. That makes it, formally,
a Unicode format control character. It is a Default_Ignorable_Code_Point
by virtue of that general category.
The names list has a crossreference from U+FEFF to U+FFFE,
but that is a crossreference only, and not an annotation of
U+FEFF as being a noncharacter.
> I will make sure the machine readable list is available online as
> well as in the draft.
>
> For now, see http://stupid.domain.name/idnabis/idnabis-070608.txt
>
> That is the same table as you find in the draft.
Thanks. I'll check it against my data tables.
--Ken
More information about the Idna-update
mailing list