Comments on idnabis-protocol-02

Marcos Sanz/Denic sanz at
Thu Jul 17 09:50:53 CEST 2008


and finally, some various comments on protocol-02:

* Section 3.2.1: s/and because/because/

* Section 3.2.1: s/hyphen./hyphen.)/

* Section 4.1: The text block starting with "The registry MAY permit 
[...]" and ending with "[...] MUST be rejected" could be better placed 
under Section 4.3, since subsections of section 4 are thought as logical 
steps in time.

* Section 4.2: This step starts with "Some system routine [...] ensures 
that the proposed label is a Unicode string". But it may *not* be a 
Unicode string, since the output of section 4.1 is, as defined, in a local 
native character set. That is, the step of conversion to Unicode is 

* Section 4.2: "U-labels actually produced from A-labels". Doesn't the 
definition of "U-label", as of idnabis-rationale, include the assumption 
that it actually must be produced (or have been produced) from some 
A-label? So the formulation is redundant/misleading.

* Section As a matter of fact, this step is an special 
instantiation of ("all combining marks have a contextual rule that 
does not allow them to appear at the beginning of a label"). Shouldn't 
thus be subsumed into it? This way there would be different "kinds" of 
rules and would contribute to simplicity.

* Section 4.3.3: I'd drop the sentence starting with "For example", since 
this section is a summary of the rest of the section and it should be kept 
crispy. If at all, the example should appear in the corresponding 
subsection of 4.3

* Section 4.4: s/SHOULD/should/. See my comment on section 6.2 of the 
rationale document (sent in separate mail). Usage of 2119-language should 
be motivated by interoperability, which is not an issue here.

* Section 4.5: There should be a hint for implementors on how to act if 
the Punycode operation fails (or, alternatively, an explanation for why 
the failure situations described in 3492 cannot happen here at all).

* Section 4.5: s/the prefix/the ACE prefix/

* Section 5: "The resolution-side tests are more permissive and rely 
heavily on the assumption that names that are present in the DNS are 
valid". This is a dangerous assumption and can lead to careless 
programming. See my comment on of section 10.1.2 of idnabis-rationale, 
sent in separate e-mai.

* Section 5: Sentence starting with "Among other things, this distinction 
[...]" is a bit irrelevant here, it should be moved to the rationale 
document (and I even think that I already read about this issue there).

* Section 5.3, 2nd paragraph: 'mapping different "width" forms of the same 
character'. Without context, it is a bit difficult to understand what is 
meant here (I happen to know, but I think it needs a bit more phrasing for 
a casual reader).

* Section 5.3: "Such localization changes are even further outside the 
scope of this specification than the ones mentioned above". I think the 
language is not appropriate for a standards document, and it should 
suffice with a "Such localization changes are also outside the scope of 
this specification".

* Section 5.5: "In parallel with the registration procedure [...], the 
Unicode string is checked...". What does "in parallel" mean here? 
Certainly not time synchronicity. Wouldn't it be clearer "Simmilar to the 
registration procedure"?

* Section 5.5: The six bullets are very simmilar in content (but not in 
wording) to those under section 4.3. That makes it difficult to 
implementors to follow ("why is the text different? is there some subtle 
meaning I am missing?") and adds unnecessary verbose to the specification. 
I suggest collecting the steps which are identical in registration and in 
lookup, putting them in just one separate section called "Basic 
Registration And Lookup Checks" and refering that section from 4.3 and 

* Section 5.5, regarding anchor 20: if a label not satisfying the 
idna2008-bidi requirements is not IDNA-valid, there is no point in letting 
a resolver query that U-label, it can straightahead deliver a failure. So 
IMHO the "SHOULD" should be a "MUST".

* Section 5.5: "the resolver MUST rely on the presence or absence of 
labels in the DNS to determine the validity of those labels". Actually, it 
can only be "to determine the existence of those labels", nothing further.

* Section 5.6: "[...] is converted to an A-label using the punycode 
algorithm." Add: "and prepending the ACE prefix". Btw, Punycode is 
sometimes capitalized, sometimes not.

* Section 7: s/compatable/compatible/

* Section 7: The "E" in "ACE encoding" stands for "Encoding". I'd rather 
write "ASCII compatible encoding".

* Section 7, 3rd paragraph: "privileged or anti privileged domains". I 
haven't the slightest idea what is that supposed to mean.

* Appendix A: Neither here nor in rationale-01, section 13.2 I can find a 
requirement for the IANA Contextual Rules Registry to be versioned. It 
might be obvious, but it should be made explicit. This versioning must not 
necessarily follow from Unicode versioning (one could imagine changes in 
it that are not directly bound to Unicode progress). The same goes, btw, 
for the derived property registry.

* Appendix A, about U+002D: Typo in the regexp, it should be \u002D 
instead of \u00SD

* Appendix A, about U+00B7: Typo in the regexp, it should be \u006C 
instead of \u006c

* Appendix A, about U+0375: Typo in the regexp, missing \u for the char 
and \p for the script

* Appendix A, about U+02B9: Typo in the regexp, missing \u for the char 
and \p for the script

* Appendix A, about U+05F4: Copy&paste typo in the regexp, it should be 
\u05F4 instead of \u05F3

* Appendix A, about U+3005: Copy&paste typo in the regexp, it should be 
\u3005 instead of \u30FB

* Appendix B: I am not sure of the usefulness of this whole Appendix; 
major programming languages support directly Unicode Regexps, and if some 
doesn't, the programmer can check widely available documentation. 
Regarding anchor41: What part of a construction like "\p(Script:XXX)" is 
fairly exotic? And how exotic is it in comparison with the bidi rules or 
with the elaboration of the derived property? Keeping Appendix B will lead 
to duplication of efforts and chances for inconsistence (for instance, 
right at the beginning on the character hyphen-minus: "Must appear [sic] 
at the beginning or end of a label"...). Though well-intended, I suggest 
dropping the effort of Appendix B.
Best regards,
Marcos Sanz

More information about the Idna-update mailing list