Comments on idnabis-rationale-01

Frank Ellermann hmdmhdfmhdjmzdtjmzdtzktdkztdjz at
Tue Jul 22 20:18:48 CEST 2008

John C Klensin wrote:

> 2821bis doesn't use "LDH-label" at all; it uses 
>    sub-domain     = Let-dig [Ldh-str]
>    Let-dig        = ALPHA / DIGIT
>    Ldh-str        = *( ALPHA / DIGIT / "-" ) Let-dig

That <sub-domain> is what I'm talking about, and what 
everybody would expect an <ldh-label> is about.  You
could s/*/*61/ in this production.  OTOH 2821bis has a
normative reference to RFC 1035, that already covers
63 = 1 + 61 + 1.

A slightly different <ldh-label> example found in

| id-prefix = alphanum 
| ldh-label = [61* ldh id-prefix] 
| label     = id-prefix ldh-label

<> apparently 
also uses the obvious definition of "LDH label".  

> I suggest that your repeated efforts to turn A-label
> back into a subset of LDH-label are part of what is
> causing the confusion you cite.

I'm not confused about "xn--" being LDH, and punycode
output being LDH.  But I'm confused why you claim that
they're something else, because they obviously are LDH.

> Perhaps we could try "traditional label"?

Then A-labels would be immediately a proper subset of
"traditional labels", as that is the whole purpose of
IDNA(bis).  Like 2047-encoded words are "traditional
atoms".  Funny "traditional atoms" starting with "=?"
(plus a few other syntactical details skipped here).

> I hope not.  As explained earlier, I think that leads
> us into other trouble.

The trouble without <xn--label> is that you get subtle
"valid A-label" vs. "invalid A-label" differences.  You
cannot define "invalid A-label" in a convincing way, is
an U-label an "invalid A-label" ?  Or are 42 adjacent
dots an "invalid A-label" ?  

It is clearer if an A-label is by definition "valid" wrt
IDNAbis, while some <xn--label>s can turn out to be no
A-label.  But at least they are limited to "traditional
labels starting with 'xn--'", i.e. no U-label, no 42 dots,
no "icann", ...

> DNS-label-in-Class-IN = LDH-label-or-some-other-term / 
>   A-label / binary-label / SRV-label /
>   special-form-including-double-hyphen-in-3-and-4 /
>   special-forms-as-yet-undefined

AFAIK we're not interested in SRV-label here, or are we ?
Marcos mentioned existing "--" labels, but as it was not
"xn--" we should leave it alone.

IMO we don't need a name for "any LDH-label not starting
with 'xn--'".  We might need a name for "any <xn--label>
that is no A-label", i.e. for the "invalid A-labels" in
a terminology not using <xn--label>.

But you could as well write "xn--label that is no A-label"
in the few places where you need this.

If you pick the terminology without <xn--label> you would
have (the same) few places with "invalid A-label", but you
get many places with "valid A-label" instead of "A-label".

Please don't mix these approaches, as reported by Marcos.

> you are the only one who believes that <top-label> is a
> protocol matter at all.

My impression is that quite a lot of folks want IDN TLDs.

But RFC 1123 says "no".  RFC 3696 says yes, but you said
it is only informational, and besides it got it wrong, or
rather it wasn't precise enough.  So we have to fix this,
nobody else can do.

> binary labels and all-ASCII labels that do not conform
> to the LDH rules are no more or less irrelevant to IDNA.

Binary (as in non-ASCII octet) labels *might* be relevant
when talking about U-labels.  All-ASCII labels, e.g., SRV-
labels, are mostly irrelevant for IDNAbis.  

And <top-label> is also mostly irrelevant.  Only the fact
that any (valid) "A-label" matches <top-label> is mildly
interesting, this justifies the introduction of IDN TLDs.

Ditto LDH-labels, mostly irrelevant, apart from the fact
that they are by definition no U-label, and any (valid)
A-label is by definition an LDH-label.  Of course also
any <top-label> is by definition an LDH-label.  

Above all I need syntax, your "disjoint classes" are not
obviously disjoint:

Any DNS label consists of octets and is a "binary label"
in that sense.  Any SRV-label is a binary lavel.  Likely
any LDH-label is an SRV-label (it just contains no "_").

> note that the discussion of "--" in positions 3 and 4
> was removed from rationale-01

Marcos said it wasn't in his 01 review, the message that
started this thread.
There's a rather long discussion of "--" in rationale-01.
Is the published 01 draft perhaps not what it should be ?

> The "overall design principle" is that all A-labels
> conform to the hostname syntax as defined in RFC 952

Interrupting at 952, that would get us only 24 LDH octets,
but we want as much as we can get, 63 starting with "xn--".

I can't even tell if 24 was meant to be the complete limit
(what's now 253).  Please ignore this status "unknown" RFC,
it is not helpful for IDNAbis purposes.   We don't need 952
to define LDH.  RFC 1034 has <ldh-str> with limit "63" as
we need it, and 1034 is "STD", not "unknown".

> It is not the IDNAbis documents that I'm concerned about.
> It is popular usage that then gets reflected into 
> implementations and policies.

Not exactly popular, but I used "A-label" and "LDH-label" 
intuitively in two drafts, details TBD by IDNAbis, because
I don't need any details above letter-digit-hyphen, ASCII,
"xn--", U-label, and an IDNAbis reference for my purposes.

> Formally, I don't think this WG has any responsibility 
> for caring whether there is every an IDN TLD or not.
> From the standpoint of the IDNA2008 definition, I think
> "we" are agnostic on the subject of "wanting IDN TLDs".

Okay, a clear case of "TINW".  TLDs fascinate me since I
started a whois client (long before draft-sanz-whois-srv,
and now long after it expired - one of those cases where
everything is related in the oddest ways, another status
"unknown" RFC is 1032, as "we" know... ;-)  Referenced in
1123, like 952.

 [your crystal ball]
> if we try to do this, the draft will end up in the hands
> of the DNS folks during Last Call and will run significant
> risk of getting bogged down there as they discuss whether
> the change we have chosen was correct

If they dispute "63" or LDH they have a problem with 1034.
If they insist on "alpabetic" they have a serious problem
with any IDN TLDs.  If they want RFC 3696 in syntax we can
call the "0x" cavallery.  After that they can only insist
on single alpha, and FWIW we could offer to "permit" that,
and pray that ICANN is less lunatic.

>> it is a mere clerical task to specify the minimally different
>> <top-label> and say that this updates a note in RFC 1123 2.1.
> But 1123 doesn't specify anything in terms of "LDH-label".

In 2.1 it says "alphabetic" about the concept of a "highest
level component".  The particular name for this concept in 
syntax, e.g., <toplabel> in several RFCs, is not the point.

For this discussion I picked <top-label> as name, because it
has the same length as <ldh-label> and <xn--label>.  It will
result in nice ABNF syntax, especially <xn--label> is pretty.

OTOH <traditional-label> could push you over the 69 columns
line length limit in RFCs, resulting in a messy syntax with
folded lines.


More information about the Idna-update mailing list