Table issues (was: Re: IDNAbis documents)

Patrik Fältström patrik at frobbit.se
Wed Dec 5 07:46:51 CET 2007


Thanks!

I'll have a look...

    Patrik

On 4 dec 2007, at 15.42, Kenneth Whistler wrote:

> Patrik,
>
>> We have worked quite hard to, for the first time, really have all  
>> four
>> documents that are core of IDNAbis in sync.
>>
>> They are:
>>
>> - draft-klensin-idnabis-issues-05.txt
>> - draft-klensin-idnabis-protocol-02.txt
>> - draft-faltstrom-idnabis-tables-03.txt
>> - draft-alvestrand-idna-bidi-01.txt
>
> I'll focus on issues I've found in draft-faltstrom-idnabis- 
> tables-03.txt,
> leaving to others more qualified the concerns regarding the overall
> architecture, the articulation of the four documents, implementation
> issues, and so on.
>
> My feedback will come in parts, as my analysis is ongoing.
> I just thought, given the time constraints here, that it
> might be useful to get some of the more evident feedback
> to you quickly.
>
> Re. Appendix A.
>
> There seem to be some errors in the generation
> of this table.
>
> The code point range should be "0x0000 - 0x10FFFF", rather
> than "0x0000 - 0x10FFFD", as there is no principled reason
> to exclude consideration of the last two noncharacter
> code points, U+10FFFE..U+10FFFF, when other noncharacter
> code points such as U+FFFFE..U+FFFFF, *are* included
> in the table.
>
> The derivation of the table did not correctly distinguish
> *unassigned* code points from *noncharacter* code points.
> Unassigned code points are "<reserved>" and are available
> for future encoding of characters, whereas noncharacter
> code points are *not* "<reserved (for future assignment)>" --
> they are designated functions, constitute a kind of internal
> private use, and are disallowed for interchange. (See Table 2-3,
> TUS 5.0, p. 27.) If PUA code points (e.g. U+E000..U+F8FF)
> are to be NEVER in this table, then the noncharacters
> should be NEVERNEVERNEVER! ;-), rather than UNASSIGNED.
>
> In general, having this Appendix A listing include UNASSIGNED
> code points is both distracting (from the other, more
> meaningful values) and an error-prone reduplication of
> effort. The listing of gc=Cn values is already available
> directly from:
>
> http://www.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt
>
> And that file *does* make the distinction between true
> unassigned code points and noncharacter code points
> (both of which are gc=Cn, but which differ in the
> Noncharacter_Code_Point property [see PropList.txt].)
> The derivation for the IDN inclusion table needs to
> pay attention to *both* gc=Cn and Noncharacter_Code_Point=True.
> What *would* make sense is for the Appendix listing to
> correctly identify the noncharacters as NEVER. The
> fact that it doesn't suggests that there is an error
> in the way the calculation is handling Category D.
>
> Another general issue with the document, table, and
> Section 3, Calculation of the Derived Property: The
> possible values of the IDN property still include
> a value MAYBE NOT, but in fact the calculation has no
> branch now that assigns a MAYBE NOT value, and the
> table contains on MAYBE NOT characters. Either the
> thinking about "MAYBE NOT" has changed, and the
> document hasn't caught up to that yet, or there
> is an error in how the calculation has been
> set up. As it is now, nearly all of the "MAYBE NOT"
> values from the 01 version of this ID are now listed
> in the Appendix as "NEVER". As "NEVER", they would be
> prohibited from any future consideration for IDN, which
> seems at odds with the tenor of the text describing "MAYBE NOT".
>
> I have a number of issues with the new Category J
> and its relation to the newly suggested "CONTEXT"
> value for the property, but I'll take those up
> separately.
>
> Regards,
>
> --Ken
>
>
>



More information about the Idna-update mailing list