A-label definition

Frank Ellermann hmdmhdfmhdjmzdtjmzdtzktdkztdjz at gmail.com
Tue Jun 24 00:51:35 CEST 2008


John C Klensin wrote:

> Jon's judgment was that we would be better off with a clear 
> lexical distinction, based on length, between ccTLDs and gTLDs.
> For better or worse, that distinction is now ancient history.

As ancient as one minute ago (somehow I ended up in a Wikipedia
dispute about a "proposed top-level domain" QC, the proponent
did not read the "country code top-level domain" article with
the RFC 1591 fine print).

> My intuition --again consistent with extrapolation from the 1591
> discussions-- is that TLD U-labels (or, more generally, anything
> that isn't strictly a U-label) should not include any digits (in
> any script) or punctuation (even hyphens), regardless of what is
> permitted elsewhere.

Dunno, figuring out what is a good, bad, or ugly U-toplabel for
a given valid U-label is something ICANN can do.  If they want
a rule in the IDNAbis RFCs about it, fine.  I'd have vague ideas
why "only non-ASCII digits" in a U-toplabel would be odd, after
all "only ASCII digits" isn't permitted.  But if they don't want
a rule better don't talk about it.

> Certainly it is possible that I'm being too conservative
> But I'm also not really interested in finding out, given the
> sweeping consequences of misinterpreting a TLD string and also
> given that there is no obvious _need_ for such strings

Maybe somewhere in the world counties are known by strings
containing digits, and want to get TLDs (pure speculation), 
IMO this is not "obvious".  Conservative is good, but I like
KISS better... :-)

> Should IETF try to impose any requirements or limitations that
> would apply strictly to TLD labels, or should we decide that
> they are just "policy" and leave them to ICANN?

The proper <toplabel> subset of LDH has to be specified, it's
messy at the moment with at least five versions in the wild.
U-toplabel could be "policy".

> and, in particular, do not trust them to favor conservatism
> about long-term identifier integrity over the short-term
> commercial interests of someone with a clever idea.

That's why I wrote "pick version 1 or 2, but not 1" (SC TLD)
for <toplabel>.  For a single non-ASCII code point U-toplabel
see above.  If you want to do something about the U-toplabels
please keep it simple, I'm more interested in LDH <toplabel>s.

> * We should continue to restrict ASCII TLD strings (a
> subset of "LDH labels" in IDNA2008-speak) to
> alphabetic-only... no digits or hyphens at all.

"xn--" labels have hyphens and digits, for implementations it
means they have to accept this anyway.  MUST start with <let>
is ok. (eid=1335), MUST contain non-digit is ok. (RFC 3696),
let's not make this more complex.

Same idea as in MIME, implementations do not need to know
that =?...?.?...?= is magic, they treat it just as a word.
Similar implementations don't need to know that xn--... is
magic, it's just a peculiar LDH label.  Only IDNA software
will try to do more with xn--...

> one could still have an FQDN of 1.2.3.4.5 or 1.2.3, as 
> long as 1.2.3.4 is avoided.

Some stupid software looks for "only dots or digits" for
its decision "might be a FQDN or IPv4", and it doesn't
count dots, or check that 1.2.3.456 is no IPv4.  Now you
could say "this software is broken, fix it", but it was
good enough to handle IPv4 vs. FQDN implicitly.  

> not because there would be serious problems for very 
> careful applications.

Sure, some protocols insist on square brackets for IPs,
after that decision the whole issue doesn't exist.  But
other popular protocols don't for IPv4, notably STD 66.

 Frank



More information about the Idna-update mailing list