Comments on idnabis-rationale-01

John C Klensin klensin at jck.com
Tue Jul 22 19:41:49 CEST 2008



--On Tuesday, 22 July, 2008 18:30 +0200 JFC Morfin
<jefsey at jefsey.com> wrote:

> John,
> I appreciate this is something for me as a user to understand
> better the published text. However, at this time I am quite
> confused. Could you please help with what follows:

> 1) a simple table giving the term, an example, and a
> definition. For U-label, A-label, LDH-Label, traditional ASCII
> label, invalid?

U-label: A valid IDNA label, expressed in native character form
and that contains at least one non-ASCII character.  Note that a
U-label must not contain any characters (or code points) that
are invalid under IDNA2008, i.e., a U-label must not contain any
DISALLOWED or UNASSIGNED code points, nor any code points that
require contextual rules without those conditions being met, nor
any right-to-left characters unless the label fulfills the Bidi
conditions.  Frank would like to specify length limits for these
strings.  I think that is unnecessary but not necessarily
harmful: if it were added, the definition would not change in
any essential way.
 Example (with apologies to anyone whose mail system can't
handle UTF-8): "уепуха"

A-label: A valid IDNA label, expressed in punycode-encoded form,
with the "xn--" prefix.  Validity for an A-label is determined
by mapping to its U-label form and back.  I believe Frank would
like to specify length limits for these strings.  I think that
is unnecessary but not necessarily harmful: if it were added,
the definition would not change in any essential way.
  Example: "xn--80aj4aocm"

LDH-label as I have been trying to use the term: A string
consisting exclusively of ASCII letters, digits, and the hyphen,
with the hyphen in neither the first nor last position _and_
that does not begin with the "xn--" prefix.  Note that LDH-label
defines a subset of strings that are valid under the LDH
criterion, and that is the core of what Frank and I are arguing
about.
  Example: "nonsense"

IDNA-invalid: A string that is none of a U-label, an A-label, or
an LDN-label.
  Examples: "_tcp" (contains a non-LDH character),
	 "-foo" (violates the LDH rule, because it starts with a
	hyphen), 
	 "xn---ghikl" (starts with "xn--", but is not a valid
	A-label), 
	 "Уепуха" (contains a character that is DISALLOWED
	because it would require mapping).

IDNA-valid: Any of an LDH-label, A-label, or U-label.
	
traditional ASCII label: Either a synonym for, or an alternative
to, LDH-label.   Still looking for a good term for this if the
group's consensus is that the distinction between "LDH-label"
and such terms or concepts as "ldh-string" or "LDH conforming"
are too confusing.   My best proposal right now (as distinct
from a few hours ago when I proposed "traditional ASCII label"
might be to use something like "IDNA-LDH-label" or, much as I
hate it, IDNA-ASCII-label".  Either one would at least make the
context painfully clear.

(new term)
bad-idea-for-permitting-extensibility-and-future-global-interoperability:
This concept and term are outside the scope of this WG, but
would, in my personal, narrow-minded, opinion, include any label
string that is not given a clear interpretation by some
standard.  The category includes any string that has consecutive
hyphens, especially in the third and fourth position, and that
isn't an A-label; any string that is ASCII but not
LDH-conforming for which a special meaning has not yet been
assigned (certain SRV protocol-identifying labels have been
assigned such meanings, despite not being LDH-conforming).   It
includes any use of binary labels, or octet-oriented labels
actually stored in the DNS with the high-order bit set, until
and unless a specific protocol or contextual definition is given
to them.  It also includes some other constructions, such as
using "0xFF" as a label: technically perfectly valid under the
protocols today but bound to lead to trouble sooner or later.

"Invalid" is not well-defined since, in principle, the DNS can
store any string subject only to length limitations, either one
organized into octets or one that is not.  So "invalid" exists
only wrt a protocol or context.


> 2) what are "a--abcdef", xn---ghikl" in your terminology?

The first is an LDH-label even though it makes me cringe.  The
second is IDNA-invalid because it looks like an A-label but is
not a valid one.
  
> 3) is there an objection to "xn--abcd--efgh" and how do you
> name it?

While it is an eyesore and yet another example of why users may
be deceived if they make assumptions about what A-labels look
like, it is a valid A-label.  The U-label form has a single
embedded hyphen and consists of the ASCII string "abcd-"
followed by the two characters
  Armenian Small Letter Za (U+0566)   and
  Armenian Small Letter Eh (U+0567)
So, no objection, and I call it an A-label.  That string would
be problematic for any zone administrator or application that
had decided to enforce a "no mixed script" rule, but that rule
is outside this WG's scope. 

> 4) how do you name "xn--abcdef" with no punycode conversion to
> Unicode?

IDNA-invalid, since it starts with "xn--" and does not have a
mapping to a U-label.

Hope that helps.
    john





More information about the Idna-update mailing list