referencing IDNA2008 (and IDNA2003?)
John C Klensin
klensin at jck.com
Sun Oct 24 22:15:35 CEST 2010
There is a bit of the sound of FUD here and I don't think that
is useful, especially when some of the group are complicating
things with their own vocabulary.
(1) We had experimental confirmation that the terminology in
IDNA2003 created a certain amount of confusion. That was
partially due to not making a distinction between a "Unicode"
(really, native-script" or Unicode encoded in one of the
Unicode-standardized encodings) label that could be encoded by
ToASCII (after Nameprep processing and so on) and a similar
label that could be obtained from applying ToUnicode to such a
valid label encoded by ToASCII. There was also confusion about
the exact status of strings that were encoded using Punycode,
had the "xn--" prefix, but were not actual, valid, IDN labels.
And, finally, there was confusion about whether strings encoded
by Punycode should be looked up without validating them
(although I personally believe that IDNA2003 was very clear on
that point, there was confusion nonetheless). For exactly those
reasons, IDNA2008 contains much more precise definitions. That
creates a situation in which, depending on how one looks at
things, there is a much broader range of strings that can be
considered valid (particularly those that depend on mapping) in
IDNA2003 than there is in IDNA2008. That difference, however,
has a lot more to do with the validity of different presentation
forms (again, including forms that require mapping to be valid)
than fundamental validity. And every valid A-label (under
IDNA2008) that represents Unicode characters that are valid in
Punycode-encoded labels that are valid output of ToASCII under
IDNA2003 is valid under IDNA2003 and has the same interpretation
when converted back to a native character string (U-label in
(2) Mark is quite correct in pointing out that the IDNA2003
considers many characters, especially symbols and punctuation,
valid that IDNA2008 does not. That issue has nothing to do with
mapping and is really an entirely separate issue where validity
checks are concerned. However, those who are concerned about
that difference in terms of lookup compatibility should read
Section 5.3 of RFC 5891 very carefully if they have not done so
already. The case that it quite intentionally involves seems to
me to be precisely the one that Adam ideally has: a pair of
domain names that contain things that look like A-labels which
he wants to compare by a simple octet-by-octet comparison.
In the less ideal case in which one of the two strings to be
compared is in native-character from, not Punycode-encoded, is
complicated for another reason. IDNA2003 quite intentionally
left policies about what characters could actually be registered
for use in labels to registries, while IDNA2008 first eliminates
symbols punctuation, and other non-letter/ non-digit characters.
But every set of recommendations I've aware of for IDNA2003
recommended against registering a label containing any character
that was not a character validly used to write "words" in some
language. Modulo some hand-waving, that is the same rule that
IDNA2008 specified as part of the PVALID/ DISALLOWED
distinction. So, for those punctuation and symbol characters
(etc.) the difference between IDNA2003 and IDNA2008 has more to
do with something that one can get away with under IDNA2003 (in
some registries) but is explicitly prohibited under IDNA2008
rather than a new incompatibility.
How serious an incompatibility that is depends on how far one
wants to go to support characters that have been recognized as
bad practices since 2003 and earlier just because someone,
somewhere, may have successfully registered and used them
(remember that we are talking about domains in cookies here, not
what might have been used in an IRI).
--On Sunday, October 24, 2010 21:21 +0200 jean-michel bernier de
portzamparc <jmabdp at gmail.com> wrote:
> 2010/10/24 Mark Davis ☕ <mark at macchiato.com>
>> > These A-label having been initially registered as IDNA2003,
>> > IDNA2008 or
>> xn-ascii does not make any difference.
>> There is a bit of a problem in terminology. I used the term
>> "punycode label" to include labels of the form "xn--...",
>> where the ... is valid Punycode representations (in ASCII) of
>> a Unicode string (with non-ASCII characters).
> This seems to be an IDNA2003 and IDNA2008 well acceptable
>> If we are speaking precisely, the term "A-Label" is defined
>> in IDNA2008, and is more restrictive. It does not include all
>> the punycode labels that are valid in IDNA2003. So
>> "http://xn--1-wpn.blogspot.com/" (= http://€
>> 1.blogspot.com/) does not have an A-Label in it, but is a
>> punycode label, and is valid in IDNA2003.
> This is also my understanding. But I understand it is only
> more restrictive because the IDNA2008 conversion terms are
> more restrictive.
>> Because A-Label is defined in IDNA2008 (not in IDNA2003), we
>> should follow the IDNA2008 definition precisely. Otherwise
>> communication becomes difficult -- two people think they are
>> in agreement about a point involving A-Labels, when they mean
>> different things, and are thus not in agreement.
> Agreement. IDNA2003 stated (RFC 3490) : "While all ACE labels
> begin with the ACE prefix, not all labels beginning with the
> ACE prefix are necessarily ACE labels." How should I call
> non-ACE/non-A-label "xn--" ASCII domain names? Such domain
> names exist and therefore can be used by Cookies.
> However, the problem I raised was that for the network DNS
> A-labels are the reference, while U-labels are the IUser's
> reference, and that no concept has been suggested (*) (on the
> IETF side) and no mechanism has been defined (*) on the IUse
> Architecture side to make sure they strictly correspond
> throughout applications. (*): I am aware of except the
> IDNApplication concept (i.e. to centralise punycoding on each
> machine/network) and the ML-DNS JFC is working on a running
> Frankly, my understanding is that IDNA2008 is a vertical
> begining and real IETF horizontal work is to come, as per
> and their
> committed reponse .
>> *— Il meglio è l'inimico del bene —*
>> On Sat, Oct 23, 2010 at 19:05, jean-michel bernier de
>> portzamparc < jmabdp at gmail.com> wrote:
>>> 2010/10/24 Mark Davis ☕ <mark at macchiato.com>
>>> I'm in agreement about the usefulness of storing the
>>> punycode form. As to
>>>> what you would like to see, Patrik, I'm in agreement there
>>>> as well; that the goal is IDNA2008. And I think we'll get
>>>> there eventually, when the major registries disallow the
>>>> registrations of non-IDNA2008 names.
>>> Dear Mark,
>>> whatever the policy of the "registries", their transitions,
>>> their interest in Unicode, their commercial, cultural or
>>> political strategies, etc. they only use A-labels as far as
>>> the Internet and the Internet DNS are concerned (them having
>>> "xn--" headers or not - remember that until IDNA2003 every
>>> cooky was A-label only). These A-label having been initially
>>> registered as IDNA2003, IDNA2008 or xn-ascii does not make
>>> any difference. Cookies are not interested in the origin of
>>> the domain name, but in the value of the domain names. Every
>>> IDN has one and only one lowercase A-label value. And this
>>> value is here to stay.
>>> Considering anything else for cookies, is to reintroduce the
>>> confusion that IDNA2008 clarified.
>>> Remember the sensible ".su" position: they will not register
>>> U-labels, but only A-label whatever the reverse
>>> The problem for implementers is not there. The problem is to
>>> obtain a local user authoritative A-label, something the AD
>>> was told not to ask but IAB will have to document.
More information about the Idna-update