Stupid U-label question [correction]

John C Klensin klensin at jck.com
Tue Aug 19 23:54:02 CEST 2008



--On Tuesday, 19 August, 2008 17:01 -0400 William Tan
<dready at gmail.com> wrote:

>>> | Let u2 = xn--4caä, and use punycode *directly*
>>> | to bypass the RFC 3490 4.1 (5) restriction.
>>> |
>>> | That results in A-label xn--4ca-cxa.  Therefore
>>> | in theory u2 could have A-label xn--xn--4ca-cxa
>>> | IFF that is not prohibited somewhere in IDNA200X.
>> 
>> "xn--xn--4ca-cxa" is a resolvable LDH label. Under IDNA2003,
>> it will not get converted by ToUnicode because it will fail
>> Step 6 of RFC3490 Section 4.2 (running it through ToASCII).
>> 
> 
> Let me clarify:
> 
> ToASCII("xn--4caä") will fail.
> ToUnicode("xn--xn--4ca-cxa") will result in itself
> "xn--xn--4ca-cxa", though we don't know or care how it was
> produced. All we know is that it does not come from a valid
> U-label.

Since we seem to be dragging this out, "xn--xn--4ca-cxa" is a
_valid_ label under IDNA2003.  It is just not a valid IDN
because, as Wil points out, it can't be converted by ToASCII.
For IDNA2008, it is invalid for IDNA-conformant applications.
The difference between the two is that I believe an
IDNA2003-conformant application would be expected to look it up
and that an IDNA2008-conformant application that was being
careful about A-labels would be expected not to.

In earlier versions of the IDNA2008 specs, it was banned by the
prohibition on any use of labels containing "--" in the third
and forth positions being anything but an A-label.  The WG (and
others) objected to that because it apparently imposed
constraints on DNS applications that were not IDN-aware/
IDNA-conformant.  The text in the current working draft of
Rationale is included below.   If people don't like it, comments
and suggestions can still make -02 before it is posted.  But the
prohibition is, I believe, quite clear.

  -----
 as part of the definition of "A-label":

	This means, by definition, that every A-label will begin
	with the IDNA ACE prefix, "xn--", followed by a string
	that is a valid output of the Punycode algorithm and
	hence a maximum of 59 ASCII characters in length. The
	prefix and string together must conform to all
	requirements for a label that can be stored in the DNS
	including conformance to the rules for the preferred
	form described in RFC 1034, RFC 1035, and RFC 1123.

 The language then goes on to say:

	Strings that do not conform to the rules for one of
	these three categories and, in particular, strings that
	contain "--" in the third and fourth character position
	but are:

   o  not A-labels or

	o  cannot be processed as U-labels or A-labels as
	   described in these specifications,

	are invalid in IDNA-conformant applications as labels in
	domain names that identify Internet hosts or similar
	resources.  

I believe that is painfully clear and that it has no
implications at all for other than IDNA-conformant applications
evaluating domain names in IDNA-appropriate "slots".

    john




More information about the Idna-update mailing list