comments on defs

Andrew Sullivan ajs at shinkuro.com
Fri Jul 24 15:53:14 CEST 2009


Dear colleagues,

I have read draft-ietf-idnabis-defs-09.  I have some comments.

For the most part, the document is in very good shape, and what
follows are the sort of picky things that one might expect when
refining the final versions of a document.

§1.3, final bullet.  I presume the reason mapping isn't explicitly
noted as not normative is because we haven't come to that conclusion
yet?

§2.2  should the terminological assumptions extend to 1123 and 2181?

Also, I thought we'd moved away from "registry" and towards "zone
administrator" or something like that?  I know this confusion is
partly my fault.  Maybe we just need to identify these terms
explicitly as (rough?) synonyms?

It might not be a bad idea to add even more emphasis to the domain
name/FQDN/hostname distinction.  Maybe

    … in some cases.  The strict meaning of host name, and
    restrictions on such names, is defined in RFC 952.  These
    documents …

?

The para starting "A label is…" makes it sound like IDNA is changing
the definition of labels, but the A-label/U-label distinction makes it
plain that such a change is not really contemplated.  I think this
could be cleared up thus:

   IDNA extends the set of usable characters in labels that are
   treated as text (as distinct from the binary string labels
   discussed in RFC 1035 and RFC 2181 [RFC2181] and the bitstring ones
   described in RFC 2673 [RFC2673]), but only in certain contexts.
   The different contexts for different sets of usable characters are
   oultined in the next section.  For the rest of this document and in
   the related ones, the term "label" is shorthand for "text label",
   and "every label" means "every text label", including the expanded
   context.

§2.3.2.1 bullet 1.  I am inclined to think that removing the 63
character remark in this bullet will help with clarity.  It strikes me
that such a remark offers considerable potential for someone to
insist that a candidate U-label that is <63 characters but that
corresponds to an A-label >63 characters is still a candidate U-label
because it is IDNA-valid.  Really, I don't think the DNS reference
helps.  

bullet 2 (end): is a string meeting the listed requirements that is
also able to be decoded into a U-label an A-label?  If so, the
positive definition would be better than this not-quite-complete one.

bullet 3: the "normally UTF-8" makes me nervous.  Is any Unicode label
a U-label, or not?  (I think "yes".)

anchor 14: no, it's not clear.  I was in fact going to ask whether
defs was the right place to be specifying what an IDNA-aware
application might do.  I'm not sure how to fix this.

I found this sentence so mystifying I needed to read it four times
before I got it:

   In the operations of [IDNA2008-Protocol] strings are processed that
   appear to be A-labels or U-labels --i.e., they appear as input to
   operations or in other contexts where A-labels or U-labels would be
   expected and are, respectively, ASCII strings starting in "xn--"
   (case independent) or strings that contain one or more non-ASCII
   characters-- but that are in the process of validation rather than
   having been demonstrated to conform to all of the conditions outlined
   above. 

I'm dim, so that explains why it took me so long, but I worry about
the comprehensibility of this for others.  What about this:

    During processing (according to the working of
    [IDNA2008-Protocol]), a string that appears to be an A-label or a
    U-label is handled.  Such strings are not yet demonstrably
    conformant with the conditions outlined above, because they are in
    the process of validation.  These strings …

?  Alternatively, just take this bit out, because it seems to me to be
covered under 2.3.3 anyway.

2.3.2.3

   An "internationalized domain name" (IDN) is a domain name that may
   contain any mixture of NR-LDH-labels, A-labels, or U-labels. 

This entails that www.example.com is an IDN: it contains three
NR-LDH-labels, which meets the "any mixture" rule, since one possible
mixture is "none".  That might be surprising to some people.  What
about

   An "internationalized domain name" (IDN) is a domain name that
   contains at least one A-label or U-label, but that otherwise may
   contain any mixture of  NR-LDH-labels, A-labels, or U-labels. 

?

I continue to find this text a little strange:

   Because of the diversity of characters that can be used in a
   U-label and the confusion they might cause, such restrictions are
   mandatory for IDN registries and zones even though the particular
   restrictions are not part of these specifications.

I know that what this is really saying is, "You MUST have a policy,
even if that policy is NULL."  I might find this less perplexing if
the text then contained a reference to mappings, and some advice to
use it -- something along the following lines:

    A minimal set of restrictions advisable for all registries is
    found in [mappings].

2.3.2.4

    o  Exact (bit-string identity) matches between a pair of A-labels

Surely this isn't quite right?  Since every A-label is an LDH-label,
A-labels match according to the DNS rules, no?  If not, we have a
problem, because now A-label matching does _not_ conform to DNS rules,
and so we have to figure out some way to make them match.

Isn't 2.3.3 a restatement of the discussion in 2.3.2.1 (see above for
proposed text for that section)? 

Those are all the comments I have today.

Best regards,

Andrew


-- 
Andrew Sullivan
ajs at shinkuro.com
Shinkuro, Inc.


More information about the Idna-update mailing list