Comments on idnabis-rationale-01
John C Klensin
klensin at jck.com
Tue Jul 22 19:41:49 CEST 2008
--On Tuesday, 22 July, 2008 18:30 +0200 JFC Morfin
<jefsey at jefsey.com> wrote:
> John,
> I appreciate this is something for me as a user to understand
> better the published text. However, at this time I am quite
> confused. Could you please help with what follows:
> 1) a simple table giving the term, an example, and a
> definition. For U-label, A-label, LDH-Label, traditional ASCII
> label, invalid?
U-label: A valid IDNA label, expressed in native character form
and that contains at least one non-ASCII character. Note that a
U-label must not contain any characters (or code points) that
are invalid under IDNA2008, i.e., a U-label must not contain any
DISALLOWED or UNASSIGNED code points, nor any code points that
require contextual rules without those conditions being met, nor
any right-to-left characters unless the label fulfills the Bidi
conditions. Frank would like to specify length limits for these
strings. I think that is unnecessary but not necessarily
harmful: if it were added, the definition would not change in
any essential way.
Example (with apologies to anyone whose mail system can't
handle UTF-8): "уепуха"
A-label: A valid IDNA label, expressed in punycode-encoded form,
with the "xn--" prefix. Validity for an A-label is determined
by mapping to its U-label form and back. I believe Frank would
like to specify length limits for these strings. I think that
is unnecessary but not necessarily harmful: if it were added,
the definition would not change in any essential way.
Example: "xn--80aj4aocm"
LDH-label as I have been trying to use the term: A string
consisting exclusively of ASCII letters, digits, and the hyphen,
with the hyphen in neither the first nor last position _and_
that does not begin with the "xn--" prefix. Note that LDH-label
defines a subset of strings that are valid under the LDH
criterion, and that is the core of what Frank and I are arguing
about.
Example: "nonsense"
IDNA-invalid: A string that is none of a U-label, an A-label, or
an LDN-label.
Examples: "_tcp" (contains a non-LDH character),
"-foo" (violates the LDH rule, because it starts with a
hyphen),
"xn---ghikl" (starts with "xn--", but is not a valid
A-label),
"Уепуха" (contains a character that is DISALLOWED
because it would require mapping).
IDNA-valid: Any of an LDH-label, A-label, or U-label.
traditional ASCII label: Either a synonym for, or an alternative
to, LDH-label. Still looking for a good term for this if the
group's consensus is that the distinction between "LDH-label"
and such terms or concepts as "ldh-string" or "LDH conforming"
are too confusing. My best proposal right now (as distinct
from a few hours ago when I proposed "traditional ASCII label"
might be to use something like "IDNA-LDH-label" or, much as I
hate it, IDNA-ASCII-label". Either one would at least make the
context painfully clear.
(new term)
bad-idea-for-permitting-extensibility-and-future-global-interoperability:
This concept and term are outside the scope of this WG, but
would, in my personal, narrow-minded, opinion, include any label
string that is not given a clear interpretation by some
standard. The category includes any string that has consecutive
hyphens, especially in the third and fourth position, and that
isn't an A-label; any string that is ASCII but not
LDH-conforming for which a special meaning has not yet been
assigned (certain SRV protocol-identifying labels have been
assigned such meanings, despite not being LDH-conforming). It
includes any use of binary labels, or octet-oriented labels
actually stored in the DNS with the high-order bit set, until
and unless a specific protocol or contextual definition is given
to them. It also includes some other constructions, such as
using "0xFF" as a label: technically perfectly valid under the
protocols today but bound to lead to trouble sooner or later.
"Invalid" is not well-defined since, in principle, the DNS can
store any string subject only to length limitations, either one
organized into octets or one that is not. So "invalid" exists
only wrt a protocol or context.
> 2) what are "a--abcdef", xn---ghikl" in your terminology?
The first is an LDH-label even though it makes me cringe. The
second is IDNA-invalid because it looks like an A-label but is
not a valid one.
> 3) is there an objection to "xn--abcd--efgh" and how do you
> name it?
While it is an eyesore and yet another example of why users may
be deceived if they make assumptions about what A-labels look
like, it is a valid A-label. The U-label form has a single
embedded hyphen and consists of the ASCII string "abcd-"
followed by the two characters
Armenian Small Letter Za (U+0566) and
Armenian Small Letter Eh (U+0567)
So, no objection, and I call it an A-label. That string would
be problematic for any zone administrator or application that
had decided to enforce a "no mixed script" rule, but that rule
is outside this WG's scope.
> 4) how do you name "xn--abcdef" with no punycode conversion to
> Unicode?
IDNA-invalid, since it starts with "xn--" and does not have a
mapping to a U-label.
Hope that helps.
john
More information about the Idna-update
mailing list