A-label definition (was: IDN test TLDs)
YAO Jiankang
yaojk at cnnic.cn
Fri Jun 20 03:51:21 CEST 2008
IMO,
It is better if we clarify 3 definitions.
LDH , which is the domain name lable defined in RFC 1034 and 1035
U-label , which contains at least a non-ASCII character
A-label, which is transformed from U-label with the algorithm (punycode), plus a prefix such as XN--
(some lable withe the prefix XN-- can not be converted to U-label is not valid A-label)
LDH label includes A-label.
YAO Jiankang
----- Original Message -----
From: "Frank Ellermann" <hmdmhdfmhdjmzdtjmzdtzktdkztdjz at gmail.com>
To: <idna-update at alvestrand.no>
Sent: Friday, June 20, 2008 9:22 AM
Subject: A-label definition (was: IDN test TLDs)
Hi, I thought I know what an "A-label" is, but looking
into draft-ietf-idnabis-rationale-00 I found that this
is not the case:
(1) LDH label, that's AFAIK 1 to 63 letters, digits,
and hyphens, not starting or ending with a hyphen.
All LDH labels are technically valid host name labels,
because that's what the relevant IETF standards say.
(2) Toplabel, that is at the moment a shaky RFC 1123
erratum. IMO it should be the same as LDH label,
but including at least one non-digit. It needs
an "updates 1123" in idnabis-rationale. While at
it we could also say "not only a single letter".
If we do the latter: Folks often need syntax in
the form of STD 68 ABNF in their drafts, and we
can copy <toplabel> from RFC.ietf-usefor-usefor
If we don't do this we can copy <toplabel> from
RFC 4408. You can guess who needed this syntax,
and arrived at a slight difference. <shudder />
JFTR, a USEFOR co-Chair (i.e. Harald) asked IAB
and ICANN (IIRC) about this issue. Somebody
found a simpler <toplabel> version for the "not
only a letter" variant, I can find it if needed.
(3) U-label, the definition should mention that this
is about labels with at least one non-ASCII code
point, otherwise we would get a confusing overlap
with LDH labels.
(4) A-label, that is apparently the proper subset of
valid LDH labels (see 1) starting with "xn--",
and corresponding to valid U-labels (see 3). By
definition an A-label is also a valid <toplabel>,
and we don't need to talk about this.
There's an open question about "valid U-toplabel", is
more than one code point required. I think it is not
required: Depending on the script "one code point"
can express things that would need several letters in
other scripts. ICANN can sort this out.
(5) I-label (making up a new term for this article):
An "I-label" is an U-label in legacy non-Unicode
and non-ASCII charsets, as found in RFC 3987 IRIs,
or more precisely in labels of an <ihost> for a
corresponding registered DNS host name.
The typical example is "bücher", unless I screw up
and send this as UTF-8. Please assume that I want
windows-1252 or iso-8859-1, not UTF-8.
Maybe idnabis-rationale should define I-label with
a reference to RFC 3987. I also don't see why the
U-label is limited to a "standard Unicode encoding
form", that would mean "can be SCSU, but not BOCU,
UTF-7, UTF-1, GB 18030, etc.". IMO the question of
encoding forms misses some points, maybe we should
simply rename U-label to I-label:
"I" as in I18N, IDNAbis, IRI is intuitive and KISS.
Above all I disagree with the proposed decree that all
LDH labels with a hyphen in position 3 and 4 have to
be A-labels. That could require to update hundreds of
RFCs simultaneously, followed by a worldwide upgrade.
Looking at this from the other side: If a worldwide
upgrade would work we could simply decree that host
names can use UTF-8, and be done with it. As this is
obviously wrong we cannot say that certain LDH labels
are "invalid", we can only define valid A-labels, and
anything else is whatever it is, xn--cocacola.
Frank
_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update
More information about the Idna-update
mailing list