Re: idna folding (was Re: idna-bis and '゜')

Erik van der Poel erikv at google.com
Tue Dec 18 06:59:07 CET 2007


Hi John,

Thanks for filling in the gaps. I also tend to agree with you about
transmitting only strings that are in reduced, final form.

IDNA2003 certainly attempted to be clear about when it was OK to place
IDNs in domain name slots, and that it was necessary to convert to
Punycode and ASCII dots when placing the IDN in an IDNA-unaware domain
name slot. See section 3.1 (2) of RFC 3490.

Unfortunately, a number of implementors decided to allow non-final
(variant) forms into HTML URIs/IRIs. Strictly speaking, they were
violating the IDNA2003 spec, because there was no HTML spec at the
time that allowed IDNs in variant forms. In other words, HTML had
IDNA-unaware domain name slots at the time.

So I think it would be a good idea for the email UTF-8 spec to be
crystal clear about the exact forms that are allowed. Your IDNAbis
issues draft introduces the new terms U-label and A-label (thanks for
those). We may need additional terms for FQDNs in various forms. For
example:

V-labels = variant labels, i.e. those that cannot be obtained by
applying ToUnicode on any A-label, but can be converted to A-labels by
ToASCII
FQADN = fully qualified domain name consisting of A-labels, LDH-labels
and ASCII dots
FQUDN = fully qualified domain name consisting of U-labels, LDH-labels
and ASCII dots
FQVDN = fully qualified domain name consisting of V-labels, U-labels,
A-labels, LDH-labels and IDNA2003 dot variants

(These acronyms sound and look terrible. I hope someone comes up with
better ones.)

So the email UTF-8 spec may want to specify that the domain parts must
be FQADNs or FQUDNs. This way, we use the final (and near-final) forms
only.

Erik

On Dec 17, 2007 8:20 PM, John C Klensin <klensin at jck.com> wrote:
> Where we may disagree is about the contexts in which the variant
> forms (those that cannot be regenerated after ToASCII
> conversion) should be permitted to be transferred over the
> network.  With XML and HTML files, there is a gray area because
> the question of what is transferred is a little ambiguous.  But
> with email, all of the precedents and all of the experience
> suggest that one should be transmitting only strings that are in
> reduced, final, form as understood by the destination server.
> And that, in turn, requires or at least strongly suggests that
> domain names be either in ACE form or in a form that can be
> obtained by processing the ACE form back through ToUnicode (or
> its IDNAbis equivalent).
>
> One can certainly reach a different conclusion but I suggest
> that our operational experience implies that one requires much
> stronger justification than "it would be great" or "someone
> would like to do it" to justify sending the non-final forms over
> the network.


More information about the Idna-update mailing list