tables document [Re: IDNA comments]

Patrik Fältström patrik at
Sat Jul 12 09:12:43 CEST 2008

On 7 jul 2008, at 17.05, Mark Davis wrote:

> Comments on tables-01

Lets see I understand all of your comments.

>   1. The use of human-readable names in this version is a big plus,  
> thanks!


>   2. "Codepoints with this property value will never be permitted in  
> IDNs."
>   Aside from the stability issue, this is a promise that cannot be  
> kept,
>   since a future RFC could modify this for IDNs (as pointed out on  
> the list).
>   Replace by:
>   "are not permitted", or something like "will never be permitted  
> unless
>   this document were obsoleted".


>   3. "It should be suitable for newer revisions of Unicode, as long  
> as the
>   Unicode properties on which it is based remain stable."
>   Replace by
>   "This is suitable for any newer versions of Unicod as well.  
> Changes in
>   Unicode properties that do not affect the outcome of this process  
> do not
>   affect IDN. For example, a character can move from So to Sm, or  
> from Lo to
>   Lu, without affecting the table results. Moreover, even if such  
> changes were
>   to result, the BackwardCompatible list (2.2.3.) will be adjusted to
>   ensure the stability of the results."

Thanks. I will though probably in the last sentence not say "...will  
be..." but instead "...can be..." as my view is that at the point in  
time where such an unfortunate incompatibility is detected, this  
document has to be updated. At that update one can choose to either  
add the codepoint to 2.2.3 or not (in reality, choose to add to 2.2.3  
and update the document or not update the document but accept the  

Ok with people?

>   4. ... on a two step procedure... => on a two-step procedure


>   5. ... That a label consists only of codepoints... => However,  
> that a
>   label consists only of codepoints


>   6. Section 2.1.3: there was a change in the definition of DICP in
>   preparation for IDNA. See Derived Property:  
> Default_Ignorable_Code_Point in
> for the
>   text for the updated text.


>   7. "In many cases aliases are used in the data in the Unicode  
> Standard.
>   This document uses both the alias and the spelled out terms (for  
> example
>   alias Ll for the General Category Lowercase_Letter)."
>   Replace with:
>   "Unicode property names and property value names may have short
>   abbreviations, such as gc for the General_Category property, and  
> Ll for the
>   Lowercase_Letter property value of that property."

Is it only property names and property values that have short forms?

>   8. Sort the following by value instead of code point, for clarity.
>   Ideally each value would be in its own subsection: PVALID,  
>      ...

Hmm...what do people think here? I can see reasons to have the  
codepoints (in the same script) "close" to each other (as it is now),  
while still of course understand this suggestion.

Should also the appendix be sorted in a different way (add an Appendix  
B in addition to existing Appendix A)?

>   9. "The characters 02B9, 0375 and 0483..." In Unicode we have the
>   convention that characters are represented by the format "U+02B9  
>   LETTER PRIME" in free-flowing text, that is, always including the  
> name. I
>   strongly recommend that practice be followed in all of these  
> documents; it
>   makes it far easier for someone to follow what is going on (since  
> most
>   people don't memorize these numbers ;-). You can use
> to get the  
> name, or
>   just grep the main unicode property file.


>   10. "This category includes the codepoints that property values in
>   versions of Unicode after 5.0". The 5.0 value was changed to 5.1  
> in most
>   cases, but not here. Search all the documents for 5.0 in case any  
> others
>   were missed.


>   11. "As the requirement is that codepoints having either of these
>   derived..." Missing reference. What requirement?

Will check and expand explanation.

>   12. "This category consists of codepoints in the Unicode character  
> set
>   that are not (yet) assigned. It should be noted that the set of  
> unassigned
>   characters is the larger set {Cn, Cs}."
>   The last sentence needs clarification: suggest
>   "It should be noted that Unicode distinguishes between 'unassigned  
> code
>   points' and 'unassigned characters'. The unassigned code points  
> are all but
>   (Cn - Noncharacters), while the unassigned *characters* are all  
> but (Cn +
>   Cs).


>   13. "If needed, IANA should (with the help of an appointed expert)
>   suggest updates of this RFC where BackwardCompatible (Section  
> 2.2.3) is
>   updated, a set that is at
>   release of this document is empty."
>   This isn't going to work. I suggest that the backwards compatible
>   character list, the exceptions list, and the context rules all be  
> in a
>   single document published by IANA, and controlled by the group  
> discussed in
>   rationale. We then need to provide guidance and constraints on  
> this group.
>   This kind of process is not new: for example, BCP 47 has very  
> stringent
>   guidelines on how the IANA language-subtag-registry is to be  
> changed. In
>   this case, the text should read something like:
>   "If as a result of property changes in a version of Unicode, any  
> assigned
>   character under the old version of Unicode would have a different  
> value
>   according to this document than in the new version, then the IANA  
>   committee must amend the BackwardsCompatible List to ensure that  
> the value
>   remains stable. This must be published by IANA immediately upon  
> release of
>   the new version of Unicode (such timing is easily feasible because  
> of the
>   long lead times for Unicode beta versions)."



More information about the Idna-update mailing list