A-label definition

John C Klensin klensin at jck.com
Mon Jun 23 14:52:52 CEST 2008


Mark,

I think the issue here is not what is syntactically possible,
but about what makes good sense.  We presumably all know that
virtually anything can be placed into a DNS label, including
arbitrary strings in UTF-8, UTF-32, or random character sets.
RFC 2181 is quite clear about that and, in that area, it really
contained nothing new although there had been enough confusion
to justify writing it.

IDNA was designed around the assumption that we didn't want to
tamper with the host name rules which are, in turn, reflected as
the "existing" rules in 1035, specific syntax in 821, etc.  It
wasn't that we couldn't simply, e.g., store UTF-8 in domain
labels, it is that we decided to not do (for a lot of perfectly
good reasons, but let's not reprise that discussion here).

The question of what strings should be permitted as TLD labels
has always been more a matter of judgment than of protocol.
Being pedantic about what the protocol permits does not help us
make progress; again, it is clear that the DNS itself permits
almost anything.   Jon's judgment was that we would be better
off with a clear lexical distinction, based on length, between
ccTLDs and gTLDs.  For better or worse, that distinction is now
ancient history.  My recollection and understanding, from the
pre-1591 discussions, is that "alphabetic" meant exactly that --
the intention was that, if new gTLDs were allocated, their names
would contain nothing but alphabetic ASCII characters.  The
reason was to avoid any possible confusion with IP addresses or
other, non-DNS identifiers, either by stupid parsing algorithms
or by careless people.

Now, IDNs change that rule.  One cannot have a full range of
A-labels without digits and cannot have A-labels at all without
hyphens in the third and fourth positions.    My intuition
--again consistent with extrapolation from the 1591
discussions-- is that TLD U-labels (or, more generally, anything
that isn't strictly a U-label) should not include any digits (in
any script) or punctuation (even hyphens), regardless of what is
permitted elsewhere.

How dangerous would it be to be more relaxed than that?  I don't
know.  Certainly it is possible that I'm being too conservative
But I'm also not really interested in finding out, given the
sweeping consequences of misinterpreting a TLD string and also
given that there is no obvious _need_ for such strings,
regardless of what people might "like" to do.  If we get down to
"like to do", then there are clearly folks who would "like" to
create confusion and attack vectors -- both can be quite
profitable.

Now, all of that said, at one issue clearly remains:

Should IETF try to impose any requirements or limitations that
would apply strictly to TLD labels, or should we decide that
they are just "policy" and leave them to ICANN?  My personal
view is that the type of restrictions described above are not
"just policy" because they are important to preserving the
ability of older, non-IDNA-aware, applications to continue to
behave smoothly and predictably.  I also don't trust ICANN's
decision-making processes very much and, in particular, do not
trust them to favor conservatism about long-term identifier
integrity over the short-term commercial interests of someone
with a clever idea.  I also believe, based on some small
experience, that the argument will be made there that, if
something wasn't important enough for the IETF to lay down a
firm rule, then there should be no restrictions and commercial
("competitive") interests should prevail.  

YMMD on any or all of those points -- you may, in particular,
believe that the IETF should stay out of the TLD syntax issues
on principle regardless of consequences; you may trust ICANN and
its processes to protect the integrity of the DNS and its
identifiers; or you may believe that the interests of the
Internet are best served by uncontrolled commercialization of
the DNS.  I don't believe that either of our interpretations of
history will help reach conclusions on those issues if, indeed,
we disagree.

But, FWIW, my conclusions from my reasoning about this and the
assumption that IDN TLDs are either a good idea or inevitable
are that:

	* We should continue to restrict ASCII TLD strings (a
	subset of "LDH labels" in IDNA2008-speak) to
	alphabetic-only... no digits or hyphens at all.
	
	* We should apply the same rules to U-labels (native
	character string forms) for TLDs, i.e., no digits, no
	punctuation, and, preferably, at least two or three
	characters (in the "print position" sense of "character"
	long, not dependent on however Unicode coding happens to
	work) long.
	
	* We should permit whatever A-labels fall out from the
	above.  I.e., A-labels contain hyphens by definition and
	often contain digits as a consequence of the coding.

But that is just the conclusion I get to by applying my
conclusions as postulates.  If I do, the above rules more or
less fall out.  If one starts with different postulates, one
ends up with different rules.

A few more comments below...



--On Monday, 23 June, 2008 12:51 +1000 Mark Andrews
<Mark_Andrews at isc.org> wrote:

>... 
> 	RFC 1123 also does not preclude alphanumeric.  What it does
> 	say is that all the currently allocated tlds (at the time
> 	of writing) are alphabetic and that because they are
> 	alphabetic there is no possiblilty of a clash with a dotted
> 	decimal notation for a IPv4 address.

No.  Go back and read it again.  "will be alphabetic" was a
comment about future allocations, not just the circumstances at
the time.
 
> 	At best there is guidance not to allocate a TLD which will
> 	potentially clash with a representation of a IPv4 address.
> 
> 		0xde.0xad.0xbe.0xef
> 		222.137.190.239
> 		0xdeadbeef
> 		0337.0211.0276.0357
> 		033653337357
> 		3735928559
> 
> 	xn--* will never clash with a dotted decimial or any other
> 	representation of a IPv4 address.
> 
> 	xn--* is a legal tld under RFC 952 and it was not made illegal
> 	by RFC 1123.

That is absolutely correct.   There is no "illegal" in any of
this. The question, as noted above, is about strategies and
wisdom in allocating TLD labels given that 1123 eliminated the
"no leading digit" rule for labels -- a rule that was never
enforced by the DNS but by applications protocols (such as SMTP).

Thus, a different way to put the question is "should there be
restrictions on TLD labels that are a superset of the
restrictions on labels elsewhere in the tree".  If one's answer
is "no", then my concerns, and most of the discussion above and
earlier, is irrelevant (and Frank's "<toplabel>" production is
trivial).  If the answer is "yes", or even "maybe", then the
questions are what those additional restrictions should be and
who should define them.  

Even if the answer is "yes, but only to prevent confusion with
IPv4 addresses", one could still have an FQDN of 1.2.3.4.5 or
1.2.3, as long as 1.2.3.4 is avoided.  I'd rather not, if only
because I can imagine ways in which parsers based on other
assumptions, DNAMEs, and mappings to or from reverse forms could
lead to trouble, but, again, that is a matter of conservative
preferences, not because there would be serious problems for
very careful applications.

>...

    john



More information about the Idna-update mailing list