A-label definition

Frank Ellermann hmdmhdfmhdjmzdtjmzdtzktdkztdjz at gmail.com
Sat Jun 21 20:35:55 CEST 2008


John C Klensin wrote:
 
>> (1) LDH label, that's AFAIK 1 to 63 letters, digits,
>>     and hyphens, not starting or ending with a hyphen.
 
> And not having two hyphens in the third or forth positions,
> according to the current definition in idnabis-rationale.

That would be a major change to what is currently known as
label in a FQDN of a host.  I hope for one "updates: 1123",
but not twenty of "updates: ????" for the various RFCs with
their own idea of a host <label>:

|  Domain         = sub-domain *("." sub-domain)
|  sub-domain     = Let-dig [Ldh-str]
|  Let-dig        = ALPHA / DIGIT
|  Ldh-str        = *( ALPHA / DIGIT / "-" ) Let-dig

That is an example in a not yet approved RFC about SMTP ;-)

> Note that IIR 1035 doesn't say "LDH", and 1123 doesn't
> either, they say "host name".

RFC 1035 defines <ldh-str> and <let-dig>, RFC 821 defines
<ldh-str> and <let-dig>, RFC 937 uses <ldh>, 819, 882, 883,
1034, 2486, 2645, 2821, 3467, 3490, 3696, 3743, 4185, 4282,
4290, 4408, 4471, 4690, 4713, 5178.   That's what I found 
with <http://purl.net/xyzzy/-a9/LDH+RFC> 

 <toplabel> 
> I hope that it is out of scope for this WG, but that is
> certainly subject to debate.  As you know, I've written
> the IESG asking them to give some priority to validating
> that erratum.

I don't understand why you hope that this is out of scope.
It has to be fixed for future IDN TLDs, and your erratum
update killed the happy theory that RFC 3696 is the last
word on <toplabel>, e.g., as used in the following draft:
 
<http://www.icann.org/topics/dns-stability-draft-paper-06feb08.pdf>

Folks are grabbing for anything, informational RFC or even
unverified erratum, just to get any "authoritative" source
about this.

> We probably should extend the 1123 rule to permit those
> hyphens but, IMO, that is as far as we should go.

That is already good enough, there are only two variants,

  toplabel = <let> [1*61<l-d-h> <let-dig>]  ; variant 1
  toplabel = <let> 0*61<l-d-h> <let-dig>    ; variant 2

Let's just pick what you like better, but not variant 1 :-)

> A combination of I-Ds, informational and experimental
> documents, and opinions that don't represent demonstrated
> community consensus.  Sorry if I don't find much
> authority in these.

That is because everybody waits for you to say what you
think is best in a published RFC on standards track with
an "updates: 1123" note.  The USEFOR RFC is on standards
track, with the 3696 version of variant 2 (= length two).

>> By definition an A-label is also a valid <toplabel>, 
>> and we don't need to talk about this.
 
> By whose definition?

By your definition in either RFC 3696 or Errata ID 1353,
and your definition in idnabis-rationale.  The latter
defines (in prose)...

x-label = "xn--" *<l-d-h> "-" 1*<let-dig> ; length 6..63

...and any valid A-label matches <x-label>.  Because any
<x-label> also matches <ldh-label>, and any <toplabel> 
is simply an <ldh-label> starting with a letter (length
1..63 or 2..63 depending on the chosen variant) I get:

* "x" is a letter
* "xn--" + "-" + 1*<let-dig> has length 6, and 6 > 1 
* 6..63 has the same maximal length as 1+61+1     

> all the ICANN test collection proves is that one can 
> violate 1123 without causing very many problems, at
> least for the mostly-web applications that have been
> used in tests.

Joke - I had to fix my rxwhois client, anything with a
hyphen went into the "guess what NIC handle" procedure.

> not obviously in the WG's charter.

| In particular, IDNs continue to use the "xn--" prefix

The Charter wants "xn--", it does not say "but not for
TLDs".  Vint or Lisa would tell us if they don't want
IDN TLDs for some obscure reason.

  <potentially open question: valid U-toplabel>
>> Depending on the script "one code point" can express
>> things that would need several letters in other 
>> scripts.  ICANN can sort this out.

> It is not clear who gets to "sort this out".

What I wrote was a proposal.  Do you want to tackle the
minimal length of an U-toplabel in Unicode code points ?

I'm not (yet) aware of technical reasons to do this, a
corresponding A-toplabel has length 6..63, is that not
good enough ?

> again, I hope that work doesn't belong to this WG.

That matches "ICANN can sort this out", it would be bad
if we say "two code points", and some language in some
script uses a single code point for "motherland".

The Chinese IDN test TLDs use only two code points for
"test".  The Cyrillic RF proposal uses two code points,
and it won't surprise me if somebody wants or needs one.

> The current rule (banning anything with "--" in positions
> two and three that isn't a valid A-label) in IDNA2008
> is extremely conservative wrt prefix forms as a means
> of avoiding nonsense

Nobody can prevent me from creating a label fe--2008-11-11,
it is LDH, and it makes sense from my POV.  How could we
find out if somebody uses similar labels already, and get
them to change it ?

The IDNA "xn--" approach used a proper subset of LDH for
its purposes out of necessity, but I see no technical
necessity to say that other LDH subsets are *invalid*.  

IMO figuring out which <x-label>s (see above) are valid
A-labels is interesting enough.  

> That isn't much of a restriction, since no one has
> really demonstrated a need for such strings.

There is no need to have hmdmhdfmhdjmzdtjmzdtzktdkztdjz
as label, nevertheless I ended up with it, after a piece
of software rejected about a dozen less obscure ideas,
and I lost my patience.  IIRC I needed a working jabber
account fast, I wasn't aware that this would be a label
and local part later.

> If the WG concludes that is excessive and wants to
> drop back all or part of the way to a rule that merely
> says that, if the label starts in "xn--", it must be
> an A-label, I won't lose any sleep over it...

I guess you could say that any <x-label> that is not a
valid A-label MUST NOT be registered as <toplabel>, and
that it also MUST NOT be registered in any "decent" TLD
registry (at any level managed by the TLD registry).

That is already difficult, constructed example, what if
an URI scheme xn--foo needs xn--foo.uri.arpa ?  Subtle
point, this is no <x-label> as defined above.

 <xn--cocacola> 
> If one decides that an A-label that cannot satisfy
> those rules is "whatever it is", one ends up with a
> string with two possible interpretations depending
> on the version of Unicode being used

Okay, to eliminate any "it is not even an <x-label>"
argument let's take xn--coca-cola, a valid <x-label>.

The MUSTard would guarantee that xn--coca-cola cannot
be registered if it has no corresponding U-label for
Unicode 5.1 (caveat, maybe it has, I didn't check it).

At other levels folks will do what they want no matter
what IDNAbis tries to decree.  Applications could not
decode it to an U-label, because there is no U-label.

Isn't that good enough, treat xn--coca-cola "as is" ?

 Frank



More information about the Idna-update mailing list