Stupid U-label question [correction]
John C Klensin
klensin at jck.com
Tue Aug 19 23:54:02 CEST 2008
--On Tuesday, 19 August, 2008 17:01 -0400 William Tan
<dready at gmail.com> wrote:
>>> | Let u2 = xn--4caä, and use punycode *directly*
>>> | to bypass the RFC 3490 4.1 (5) restriction.
>>> |
>>> | That results in A-label xn--4ca-cxa. Therefore
>>> | in theory u2 could have A-label xn--xn--4ca-cxa
>>> | IFF that is not prohibited somewhere in IDNA200X.
>>
>> "xn--xn--4ca-cxa" is a resolvable LDH label. Under IDNA2003,
>> it will not get converted by ToUnicode because it will fail
>> Step 6 of RFC3490 Section 4.2 (running it through ToASCII).
>>
>
> Let me clarify:
>
> ToASCII("xn--4caä") will fail.
> ToUnicode("xn--xn--4ca-cxa") will result in itself
> "xn--xn--4ca-cxa", though we don't know or care how it was
> produced. All we know is that it does not come from a valid
> U-label.
Since we seem to be dragging this out, "xn--xn--4ca-cxa" is a
_valid_ label under IDNA2003. It is just not a valid IDN
because, as Wil points out, it can't be converted by ToASCII.
For IDNA2008, it is invalid for IDNA-conformant applications.
The difference between the two is that I believe an
IDNA2003-conformant application would be expected to look it up
and that an IDNA2008-conformant application that was being
careful about A-labels would be expected not to.
In earlier versions of the IDNA2008 specs, it was banned by the
prohibition on any use of labels containing "--" in the third
and forth positions being anything but an A-label. The WG (and
others) objected to that because it apparently imposed
constraints on DNS applications that were not IDN-aware/
IDNA-conformant. The text in the current working draft of
Rationale is included below. If people don't like it, comments
and suggestions can still make -02 before it is posted. But the
prohibition is, I believe, quite clear.
-----
as part of the definition of "A-label":
This means, by definition, that every A-label will begin
with the IDNA ACE prefix, "xn--", followed by a string
that is a valid output of the Punycode algorithm and
hence a maximum of 59 ASCII characters in length. The
prefix and string together must conform to all
requirements for a label that can be stored in the DNS
including conformance to the rules for the preferred
form described in RFC 1034, RFC 1035, and RFC 1123.
The language then goes on to say:
Strings that do not conform to the rules for one of
these three categories and, in particular, strings that
contain "--" in the third and fourth character position
but are:
o not A-labels or
o cannot be processed as U-labels or A-labels as
described in these specifications,
are invalid in IDNA-conformant applications as labels in
domain names that identify Internet hosts or similar
resources.
I believe that is painfully clear and that it has no
implications at all for other than IDNA-conformant applications
evaluating domain names in IDNA-appropriate "slots".
john
More information about the Idna-update
mailing list