Comments on idnabis-rationale-01

Thu Jul 17 09:50:33 CEST 2008

All,

here follows a healthy mixture of much nitpicking and some more important 
comments on rationale-01. All in all, the document has improved a lot 
since the last time (pre wg) I read it.

* Section 1.4, about ACEs: "they allow [...] clicking on URLs even though 
the domain name displayed is incomprehensible to the user". Sorry, but I 
couldn't help laughing at the thought of the draft picturing as a positive 
trait the fact of users clicking around on URLs that are incomprehensible 
to them (and even more so in the context of IDNs and security, and even 
more so right away in the introductory text). So I would suggest a 
different, somehow more generic formulation (with the same rationale) like 
"they are a last resort that would allow rudimentary IDN usage, for 
instance, in case of the necessary fonts not being installed in the 
computer of the user".

* Section 1.5.3: s/regardless of that actual administrative arrangements 
or level in the tree/regardless of actual administrative arrangements or 
level in the DNS tree/

* Section 1.5.3: The sentence starting with "Further, because those 
documents were not terribly clear" tries to be a punch on something 
(somebody?), but the meaning gets lost without the context. Further, I 
don't think the wording is formal enough for a standards document. I 
suggest changing it into a simple "Lack of clarity in those documents has 
contributed to confusion with these terms".

* Section 1.5.4.1.1, 2nd bullet: "described in RFC 1034, RFC 1123 and 
elsewhere". That formulation is not very serious, and certainly not 
helpful. I'd go for "described in 952 and 1123.", or maybe expand that 
list with 1034/1035, but in any case drop the "elsewhere".

* Section 1.5.4.1.1, 3rd bullet: For the first time appears the concept of 
"valid U-labels" and "valid A-labels", but... isn't that a pleonasm? The 
current definitions (in the very same section) of A-label and U-label 
already require *validity*. Next paragraph tries to clarify "To be valid, 
U-labels and A-labels must obey...", but again, that's a constraint that 
is implicit in the current definition. So either we have a pleonasm, and 
should thus s/valid U-label/U-label/g and s/valid A-label/A-label/g, or 
the concept of "invalid [A/U]-label" really exists, in which case it 
should be defined (and the definition of [A/U]-label accordingly 
modified).

* Section 1.5.4.1.1: "[...] both U-labels and A-labels must represent 
strings in normalized form". I thing s/represent/be/ would be technically 
more correct. Besides that: What does "normalized" mean here? It should be 
precised (NFC?).

* Section 1.5.4.1.1: I cannot parse the sentence starting with "Strings 
that do not conform [...]" and ends with "[...] similar resources". 
Additionally I don't get to discern in which drawer are being put all 
existing domain names with hyphens in the third and fourth position. We 
have thousands of those in our zone (many starting with "bq--", you 
probably know why). Can you clarify whether those domain names, according 
to IDNA2008, "can actually appear in DNS zone files or queries" or not? 
Are they (valid) LDH-labels?

* Section 1.5.4.2: "LDH-labels are not IDNs". This sentence is 
contradictory with section 1.5.6 "An [...] IDN is a domain name that may 
contain any mixture of LDH-labels, A-labels or U-labels. This implies that 
every conventional domain is an IDN (which implies that it is possible for 
a domain name to be an IDN without it containing any non-ASCII 
characters)." It is very important to settle that question, so that the 
applicability of IDNA2008 can be perfectly defined (it somehow has to do 
with the previous issue).

* Section 1.5.5: s/include the prefix/include the ACE prefix/

* Section 1.5.5: s/output of ToASCII/output of the ToASCII operation/

* Section 1.5.6: I suggest to move the whole paragraph starting with 'An 
"internationalized domain name" [...]' under section 1.5.4 (Terminology 
Specific to IDNA), maybe between the current sections 1.5.4.2 and 1.5.4.3. 
It fits best there.

* Section 1.5.6, still the same paragraph: "such restrictions are 
mandatory for IDN registries". I still don't see the point since, like 
somebody said on the list, one possible policy could be "we allow the full 
variety and complexity of the IDNA2008 protocol including all code points 
in all possible valid combinations". In any case, making policy 
restrictions mandatory contradicts idnabis-protocol-02, section 4.4, that 
states "there SHOULD be policies" (more on that normative language later). 
Thus, if anything, I would go for a plain "we recommend such restrictions 
for IDN registries" here in section 1.5.6.

* Section 1.5.6: s/"The key words/The key words/

* Section 2.9: Duplicated with 2.8. Suggestion: Drop it.

* Section 3: All the information in that paragraph is already in section 
1.4 and, of itself, this section 3 doesn't make any structural point. 
Suggestion: Drop it.

* Section 6.1.1, 1st paragraph: s/character in this group/character in 
this category/ for nomenclature reasons

* Section 6.1.1: s/right to left/right-to-left/g for coherence with 
idnabis-bidi-01

* Section 6.1.1, 2nd paragraph: s/VALID",/VALID"/

* Section 6.1.1, 3rd paragraph: The subordinate sentence starting with 
"[...] unless the code points themselves are removed from Unicode [...]" 
is irrelevant to practice (deprecated Unicode characters are retained in 
the standard) and creates nothing but confusion. Is this again a pun I am 
missing? To my eyes, this sentence is a rationale for nothing and is 
better left out.

* Section 6.1.1.1, 1st paragraph: invert the order of ZWNJ and ZWJ code 
points within the brackets to match the order in the explanatory text.

* Section 6.1.1.1, 2nd paragraph: "Only the former are fully tested at 
lookup time", the verbe tense doesn't match the context, it would better 
be "should be fully tested".

* Section 6.1.1.2: For the first time 2119-language appears ("MUST NOT 
appear in putative labels"), but the prologue of section 6 states "[The 
information given in this section] is not normative". I actually recommend 
sticking to non-normative language (since this is only a rationale 
document) and dropping the capitalization. Btw, what is a "putative 
label"? Not defined before and not clear to me...

* Section 6.1.1.2: s/more more/more/

* Section 6.1.2, 2nd paragraph: According to my previous comment on 
hypothetical Unicode character removal, I'd just drop this sentence.

* Section 6.1.2, last bullet: s/used to form a letter/used to combine with 
a letter/

* Section 6.1.3: s/MUST NOT/must not/ for reasons explained above

* Section 6.2: Again and FWIW, "no restrictions" is also a possible 
policy, so I don't see the point. In any case: s/SHOULD/should/ for the 
reasons explained above.

* Section 7.2: s/only be exposed to users and in contexts/only be exposed 
to users in contexts/

* Section 7.3: Just a naive question, no second meanings: what reasons 
speak at the moment *against* including the Eszett in the PVALID list 
under the category of exceptions? Thanks.

* Section 7.3, last paragraph: "[...] a registry [...] should give serious 
consideration to applying a 'variant' model". Well, the adoption of 
"variants" is just a possibility among many others to deal with these 
issues (btw, wasn't the "fraud" mentioned later in the sentence explicitly 
out of scope for the wg?), so I don't understand why this document should 
declare a preference for the variant model now in this context. So please, 
change "should give serious consideration" to "could give consideration", 
if anything.

* Section 8: "no one has ever seriously claimed that being liberal in what 
is accepted requires being stupid". Sorry, but I don't find this 
appropriate for a standards text.

* Section 8: "Conversely, resolvers can (and SHOULD or maybe MUST) reject 
labels that clearly violate global (protocol) rules". idnabis-protocol-02 
section 5.5 however says that they MUST, so we can accordingly update 
here. Further, resolvers must reject labels that violate rules, nevermind 
if it's done *clearly* or in very subtle ways. Maybe the adverb in the 
sentence was in the wrong position. Text suggestion: "Conversely, 
resolvers clearly must reject labels that violate global (protocol) 
rules".

* Section 8: "If a string doesn't resolve, it makes no difference whether 
it simply wasn't registered or was prohibited by some rule". Maybe I am 
misreading here, but I certainly think that there's a difference (at least 
one rtt). I don't get the point of the statement in this context.

* Section 9, last paragraph: I concur on removing that paragraph, like the 
anchor suggests. It's off-topic and no rationale for anything obvious.

* Section 10.1.2, last sentence: "Systems looking up or resolving DNS 
labels, especially IDN DNS labels, MUST be able to assume that applicable 
registration rules were followed for names entered into the DNS". I can't 
figure out why this must be so and why it could be relevant to the 
standard in any way, and I even think that this is very brittle from a 
security point of view. No, not only brittle: it's actually dangerous. The 
right phrasing should be: "Systems looking up or resolving DNS labels MUST 
make no assumptions about the data they are going to receive".

* Section 10.1.3, last bullet: "MUST NOT validate other contextual rules 
about characters, including mixed-script label prohibitions". There is no 
such thing as a general prohibition of mixed-script labels, like the text 
might suggest, so I'd just drop the "including [...]" part of the 
sentence. The text is not normative anyway.

* Section 10.2: s/they prohibits/they prohibit/

* Section 10.5: This section is duplication with bullets 1 and 2 of 
section 10.1.1, so I concur with the suggestion in the anchor36 to 
eliminate the whole section. If this were not to happen, here follow 
further comments.

* Section 10.5: s/ICANN guidelines/ICANN Guidelines for the Implementation 
of Internationalized Domain Names/

* Section 10.5, 3rd bullet: s/the those/those/

* Section 10.5, 4th bullet: s/The actual situation is even worse than 
this.  //

* Section 10.6: This section is partially duplicated with bullet 3 of 
section 10.1.1

* Section 14: For the first time the term "Stringprep2003" appears. Change 
it to plain "Stringprep", to be coherent with the rest of the document.

Good work. Thanks for your time, John.

Best regards,
Marcos Sanz
DENIC eG