Comments on idnabis-rationale-01

John C Klensin klensin at jck.com
Tue Jul 22 16:32:56 CEST 2008



--On Tuesday, 22 July, 2008 12:20 +0200 Frank Ellermann
<hmdmhdfmhdjmzdtjmzdtzktdkztdjz at gmail.com> wrote:

> John C Klensin wrote:
>  
>> (1)  At present, LDH-label, A-label, and U-label are disjoint
>> categories.
> 
> Yes, and that cannot work, because everybody knows what "LDH"
> means, as specified in 2821bis and numerous other RFCs.  Any
> newspeak "LDH-label" has no chance against this, it would be
> hopelessly confusing.

"LDH" there, and, as far as I know, everywhere else it is used,
is simply used as a list of characters, i.e., A..Z, a..z, 0..9,
"-".  Since the term was introduced into popular discussion,
there has been a good deal of use of terms like "strings
containing LDH characters" or "conformance to the LDH rule".
2821bis doesn't use "LDH-label" at all; it uses 

   sub-domain     = Let-dig [Ldh-str]
   Let-dig        = ALPHA / DIGIT
   Ldh-str        = *( ALPHA / DIGIT / "-" ) Let-dig

It also uses Ldh-str as the RHS of "Standardized-tag", which has
nothing to do with the DNS.   RFC3490 uses "LDH code points" and
notes the source of "LDH" as a string, but does not talk about
"LDH labels".

<ldh-str> is, by the way, present in RFC 821, so that
terminology, but not "LDH-label", has been with us for a very
long time.

To the best of my knowledge "LDH-label" is used only in IDNA2008
and discussions referring to it.   While I grant that this is
not ideal, "no chance" and "hopelessly confusing" are hyperbole
unless you believe that any string that contains "LDH" must be a
reference to a label conforming to the traditional LDH
("hostname") rule.  Indeed, I suggest that your repeated efforts
to turn A-label back into a subset of LDH-label are part of what
is causing the confusion you cite.

> Whatever else "xn--0zwm56d" might be, it IS an LDH-label, it
> does work in hostnames for IETF protocols expecting hostnames.

Again, I would welcome a different term, although I would also
welcome comments from others, especially Tina and Cary, as to
whether "LDH-label" has become sufficiently entrenched using the
IDNA200X/IDNA2008 definition, that trying to retire it would
create excessive confusion.

Perhaps we could try "traditional label"?

>> if LDH-label is to include both A-labels and traditional 
>> ASCII labels (i.e., labels that do not start in "xn--"),
> 
> ...and maybe <xn-label>s which are no A-label under IDNAbis...

I hope not.  As explained earlier, I think that leads us into
other trouble.  It was precisely an attempt to make that
terminology clear than led to the attempt, in earlier versions
of Rationale, to prohibit "--" entirely in the third and fourth
positions.  Perhaps the right way out of that problem would be
to define:

  DNS-label-in-Class-IN = LDH-label-or-some-other-term / 
      A-label / binary-label / SRV-label /
      special-form-including-double-hyphen-in-3-and-4 /
      special-forms-as-yet-undefined

and then insert an explicit note that only the first two are
used in IDNA2008 and that other protocols and definitions may
make fewer, or different, decisions.

>> then we need a term for an LDH label that is not an A-label.
> 
> What for ?  We need <top-label> and likely <xn--label>. 

Based on earlier discussion, I believe you are the only one who
believes that <top-label> is a protocol matter at all.  It
certainly is not a definitional requirement.   <xn--label>,
either as a piece of syntax or as a term we intend to introduce
into popular usage, immediately leads us into having to discuss
"valid" and "invalid" ones, and ambiguity as to what the term by
itself means, as described earlier.

> But
> other LDH-labels not starting with "xn--" are irrelevant for
> the IDNAbis purposes.  

Unless they are needed as part of a definitional mechanism --a
vocabulary for talking about IDNs-- that does not have any
affect on other protocols or purposes.  I note that, as implied
by the syntax example above, binary labels and all-ASCII labels
that do not conform to the LDH rules are no more or less
irrelevant to IDNA.

> LDH-labels are by definition no U-label, if U-label requires
> at least one non-ASCII character.  And that is the case, for
> a simple disambiguation from A-label.  Therefore the rest of
> the LDH-labels cannot affect us - there are more than enough
> RFCs talking about them.
> 
> We also have no compelling reasons to talk about LDH-labels
> with "--" at positions 3+4 of the label.  Therefore in the
> spirit of KISS simply don't talk about it.  Only "xn--" is
> what we are interested in wrt <ldh-label> and <top-label>.

Unless it is needed to clarify the terminology, I agree.  You
will note that the discussion of "--" in positions 3 and 4 was
removed from rationale-01, so, unless someone wants to object,
that is no longer an issue, at least in the context of a
prohibition on non-IDNA-aware uses of the DNS.

>> the general idea is to have categories that are disjoint
>> and that, ideally, span the label space, not ones that
>> overlap in some fuzzy way and therefore require additional
>> qualification.
> 
> Sure, but the overall design principle is that all A-labels
> are LDH-labels.  That's the one and only idea of IDNA.  This
> discussion reminds me of discussions when folks proposed to
> remove the one and only idea of SPF from SPF.  Some of these
> folks had no problem with an 1 - 1 = 0 result, but I digress.

The "overall design principle" is that all A-labels conform to
the hostname syntax as defined in RFC 952, the preferred syntax
as defined in RFC 1035, the sub-domain syntax as defined in RFC
2821 (and 2821bis), the "<name>" syntax as defined in RFC 821,
and so on.  Since the term LDH-label is new, it is not bound to
that "overall design principle".

>> people talked about "punycode" as a label type
> 
> For my simplified Chinese test "xn--0zwm56d" I'd guess that
> would be "0zwm56d" with a required hyphen MIA.

You would guess that.  Many people would agree with you.   But
experience indicates that many would not and that, when people
from those groups talk with each other, confusion follows.

>  IMO A-label
> is fine, either as an "IDNAbis valid" subset of <xn--label>,
> or instead of <xn--label> resulting in valid versus invalid
> A-labels.  The latter is IMHO less desirable, but might work
> if used consistently in all IDNAbis documents.

It is not the IDNAbis documents that I'm concerned about.  It is
popular usage that then gets reflected into implementations and
policies.  I believe that the path for which you are arguing
would be perfectly reasonable for document-writing, but that it
is less reasonable for implementers who do not read carefully
and much less reasonable for people who have to deal with IDNs
while suffering from clue deficiency.

>> This WG's scope rather clearly does not including modifying
>> the DNS specifications, particularly 1034, 1035, and 2181).
> 
> ACK, but IDNAbis needs the <top-label> 1123-fix for IDN TLDs.
> Therefore it's the job of IDNAbis to do this, especially now,
> after the simple errata-approach was rejected.  

I continue to believe that fixing the text in 1123 is a matter
for some DNS WG, not for this one, especially since the issue of
services at the top level should probably be addressed at the
same time.

>> getting entangled in debates similar to those that recently
>> raged on the IETF list about domain names and host names
>> would be unwise
> 
> Most of these articles had "2606" in the subject and did not
> even remotely talk about RFC 2606, let alone the 2606bis I-D.
> 
> That is IMO no problem, I read all the articles, and where it
> was remotely related I added the contributors to the credits.
> 
> There were a few tangible ideas:  Notably ".tld" in examples;
> why I18N of ".invalid" or ".localhost" would be wrong; and
> why ".bar", ",bat", ".baz" might be ".bad" ideas.  
> 
> No new insights wrt <top-label>, and your 1123-erratum would 
> kill all reported oddities (my 3696-version missed 0x).  Now
> after both errata were rejected we need one line of ABNF to
> define it, and the four digits 1123 (in this order) in an
> XML source to get an "updates: 1123" effect.

Updating 1123 is definitely not part of our charter and it is
not required for our definitions or protocol.  Formally, I don't
think this WG has any responsibility for caring whether there is
every an IDN TLD or not.

> We already agree that 2606bis is not the place to pull this.
> 
> Let's do it elsewhere, rationale or protocol or in a unified
> IDNAbis draft.  Doing it nowhere is no option, after all we
> want IDN TLDs.  

>From the standpoint of the IDNA2008 definition, I think "we" are
agnostic on the subject of "wanting IDN TLDs".

> Asking DNS folks to do this now for us, while
> they have a major crisis, would be not only unwise, it cannot
> work...  My crystal ball says.

And my crystal ball says that this is not in our charter.  It
also says that, if we try to do this, the draft will end up in
the hands of the DNS folks during Last Call and will run
significant risk of getting bogged down there as they discuss
whether the change we have chosen was correct -- the same
discussion they would need to have to adjust 1123 themselves.
The problem is ultimately the same as the one that required
getting the ban on "--" in positions 3 and 4 out of the
conformance checking in the protocol -- it just isn't our
problem.

>> redefining the syntax or length of LDH-label (while making
>> it the superset definition while you prefer), specifying
>> its length, etc., or about defining a <top-label> category
>> that is not needed for the IDNA2008 protocol or tables, are,
>> I believe, out of scope and inappropriate.
> 
> Nobody proposed to redefine LDH-label.  It is what it always
> was, LD, optionally followed by up to 62 LDH ending with LD.

See above.  Unless you believe that every term starting in, or
including, "LDH" is equivalent to every other such term, that
term is new and largely unique to IDNA2008 discussions.
 
> You only need this syntax with a reference, in the direction
> of "as specified in RFC 1123 (among others)".  Based on that
> it is a mere clerical task to specify the minimally different
> <top-label> and say that this updates a note in RFC 1123 2.1.

But 1123 doesn't specify anything in terms of "LDH-label".
 
> I'm not going to repeat the details the third time here - or
> rather for the 1001st time since I proposed a "toplabel task
> force" in USEFOR back in 2006.
> 
> We do agree that we want IDN TLDs, or don't we ?  IFF we do
> it is our job to fix RFC 1123, the individual attempts based
> on RFC 3696 with errata didn't fly.  We can't ask say EAI to
> do it for us, IDN TLDs are no E-mail Address I18N experiment.

It is no more appropriate to try to fix 1123 here than it is to
fix  it in EAI.  The statement in 1123 is a statement about what
names will be permitted on a policy basis, even though the
policy has very important technical implications.  Not in the
charter (which, by the way, doesn't mention IDN TLDs either).

>...

    john




More information about the Idna-update mailing list