referencing IDNA2008 (and IDNA2003?)

Patrik Fältström patrik at frobbit.se
Sat Oct 23 16:40:02 CEST 2010


On 22 okt 2010, at 21.35, John C Klensin wrote:

> So, if either
> the domain-attribute or the request-host contain non-ASCII
> characters, it needs to convert those strings to A-labels
> (IDNA2008) or via ToASCII (IDNA2003).

It is a little bit more complicated than this unfortunately. If what you might get as "input" (either X or Y) might be an IRI, there is a set of IRIs that the way I read the IRI spec might contain strings that are not IDNA-2008 compatible. I have lately started to believe that the only IRIs I would like to see in a context like yours are the ones that a) is in UTF-8 and b) fulfil the requirement that they can be transformed to a URI and back with a 1:1 mapping specified in the IRI spec.

Now there is a new IRI draft out, and I have not checked the details in it, but I think we all would like to have:

- IDNA2008 where there is a 1:1 mapping between A-label and U-label, and no mapping like IDNA-2003 (potential mapping _must_ really happen outside of whatever distributed comparison algorithm we are using)

- IRIs and URIs that only contain domain names that are IDNA2008 compatible (U-label or A-label in the domain name part)

If we start with that as base rules, then you can hopefully in your spec add additional "temporary rules" that might be recommended for backward compatibility reasons. But I think you should really call them that.

If you have these rules, then you can -- modulo A-label/U-label transformation and URI/IRI transformation that both are 1:1 -- do much simpler comparison than what you otherwise can do if you have to start do transformation of Unicode strings (regardless of the encoding of the unicode string).

What is important though is that you in the security consideration section explicitly note that there are many many many combination of octets that not only are invalid when these rules are applied, but if you are unlucky you might get buffer overflow issues (at best) when trying to do various things with the strings. Like do A-label/U-label transformation.

   Patrik



More information about the Idna-update mailing list