New version, draft-faltstrom-idnabis-tables-02.txt, available

Kenneth Whistler kenw at sybase.com
Mon Jun 11 20:29:25 CEST 2007


> On 8 jun 2007, at 02.29, Kenneth Whistler wrote:

> > ... What is in the sixth
> > column is the decomposition *mapping* for a character (if
> > not trivially mapped to itself). And that mapping is then
> > used in the set of rules in UAX #15 that define the various
> > normalization forms.
> 
> So if I change the text to "Decomposition mapping is found in the  
> sixth column in UnicodeData.txt that together with UAX #15 define the  
> various normalization forms", that is correct?

Correct.


> > Section 2.4. Rule D - Ignorables
> >
> > For the statement of the derivation of Default_Ignorable_Code_Point,
> > please see the erratum of 2007-January-25 posted at:
> >
> > http://www.unicode.org/errata/
> >
> > The correct statement is:
> >
> > Other_Default_Ignorable_Code_Point + Cf + Cc + Cs
> > + Noncharacter_Code_Point + Variation_Selector - White_Space
> > - FFF9..FFFB (Annotation Characters)
> 
> I picked the text from DerivedCoreProperties.txt (as you might  
> understand).
> 
> I will correct this.
> 
> > (Varation_Selector was inadvertantly left out of the description
> > of the derivation, although it is correctly part of the actual
> > derivation of the list in the data file.)
> >
> > And the statement "Noncharacters is a property only existing in the
> > NamesList.txt." is unnecessary. The normative listing of the
> > Noncharacter_Code_Point property is in PropList.txt. And that
> > is the property used for derivation.
> 
> Hmmm...according to my reading of the two files, U+FEFF is a  
> Noncharacter_Code_Point according to NamesList.txt, but not in  
> PropList.txt. This is why I refer to NamesList.txt.

No. U+FEFF is *not* a Noncharacter_Code_Point. U+FFFE (the
byte-swapped version of a BOM, if used as a byte-order sentinel)
*is* a Noncharacter_Code_Point.

U+FEFF is valid for interchange in text. U+FFFE is not.

U+FEFF has a general category: gc=Cf. That makes it, formally,
a Unicode format control character. It is a Default_Ignorable_Code_Point
by virtue of that general category.

The names list has a crossreference from U+FEFF to U+FFFE,
but that is a crossreference only, and not an annotation of
U+FEFF as being a noncharacter.


> I will make sure the machine readable list is available online as  
> well as in the draft.
> 
> For now, see http://stupid.domain.name/idnabis/idnabis-070608.txt
> 
> That is the same table as you find in the draft.

Thanks. I'll check it against my data tables.

--Ken




More information about the Idna-update mailing list