Standardizing on IDNA 2003 in the URL Standard

James Mitchell james.mitchell at ausregistry.com.au
Thu Jan 16 23:58:01 CET 2014


The BIDI rule changed in IDNA2008 to allow trailing digits in RTL labels – I use this for determining whether an implementation is based on IDNA2003 or IDNA2008 (+ UTS 46 or another set of mappings).

Regards,
James Mitchell
ARI Registry Services

From: Mark Davis ☕ <mark at macchiato.com<mailto:mark at macchiato.com>>
Date: Friday, 17 January 2014 12:24 am
To: Anne van Kesteren <annevk at annevk.nl<mailto:annevk at annevk.nl>>
Cc: Gervase Markham <gerv at mozilla.org<mailto:gerv at mozilla.org>>, yaojk <yaojk at cnnic.cn<mailto:yaojk at cnnic.cn>>, Paul Hoffman <paul.hoffman at vpnc.org<mailto:paul.hoffman at vpnc.org>>, "PUBLIC-IRI at W3.ORG<mailto:PUBLIC-IRI at W3.ORG>" <public-iri at w3.org<mailto:public-iri at w3.org>>, "uri at w3.org<mailto:uri at w3.org>" <uri at w3.org<mailto:uri at w3.org>>, John C Klensin <klensin at jck.com<mailto:klensin at jck.com>>, IDNA update work <idna-update at alvestrand.no<mailto:idna-update at alvestrand.no>>, "www-tag.w3.org" <www-tag at w3.org<mailto:www-tag at w3.org>>
Subject: Re: Standardizing on IDNA 2003 in the URL Standard

> The point is that in practice, it [IDNA2003] isn't fixed to Unicode 3.2.

It is not unlikely that an implementation that you think is following IDNA2003 (with a non-standard, larger repertoire) is actually following UTS 46.

If you were reverse-engineering to find out which standard an implementation was following, you'd need to query certain characters to see if they were supported, and how. UTS 46 also allows two 'modes', for transitional and not, that you'd have to test. There is a table in http://unicode.org/reports/tr46/#Table_IDNA_Comparisons that illustrates this. (You'd have to look at the data tables to get a full listing.) And, of course, it is clearly possible for an implementation to be non-conformant to all of the standards we are talking about (IDNA2003, UTS 46, and IDNA2008).

As previously noted, however, casing differences and the 4 deviation characters take some careful checking, since there is a difference between what the implementation accepts and what goes out 'over the wire'. And the implementation may also not be using the latest version of Unicode, which would make a difference for UTS 46 and IDNA2008.

BTW, there's an online demo of Unicode properties that can be used to see differences. The categories are slightly different than what is shown in the above chart, but you can get a sense for the differences:

http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{any}&abb=on&g=idna2003+uts46+idna2008

One way to look at UTS 46 is as a migration layer to support client implementations during the transition of registries from IDNA2003 to IDNA2008, plus a mapping layer that can be used with straight IDNA2008.

> I think I did mention earlier on UTS46 might be okay, depending on the
details. I am hoping to hear from Mark on the matter.

​I'm not sure what specific​ questions you have about UTS 46. Can you reiterate them?




Mark<https://google.com/+MarkDavis>

— Il meglio è l’inimico del bene —


On Thu, Jan 16, 2014 at 12:48 PM, Anne van Kesteren <annevk at annevk.nl<mailto:annevk at annevk.nl>> wrote:
On Thu, Jan 16, 2014 at 11:36 AM, Gervase Markham <gerv at mozilla.org<mailto:gerv at mozilla.org>> wrote:
> On 16/01/14 11:17, Anne van Kesteren wrote:
>> It's not worse if it's fully backwards compatible and mostly
>> interoperable across all major clients. At that point the standard is
>> just wrong.
>
> And having a standard fixed to Unicode 3.2 is not also "just wrong"?

The point is that in practice, it isn't fixed to Unicode 3.2. I have
yet to encounter an IDNA2003 implementation that does that. It turns
out the setup we have in practice is a compatible evolution.


> And I refer you to my comments above. Problems like lowercasing (for
> better or worse) are punted by IDNA2008 and are labelled as an
> application-level problem. In practice, what everyone should do for best
> interoperability is implement the same application-level mappings, and
> implement ones which are as compatible as possible with IDNA2003.
> Hence.... UTS46.

I think I did mention earlier on UTS46 might be okay, depending on the
details. I am hoping to hear from Mark on the matter.


--
http://annevankesteren.nl/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20140116/22b7cb63/attachment-0001.html>


More information about the Idna-update mailing list