Standardizing on IDNA 2003 in the URL Standard

Shawn Steele Shawn.Steele at microsoft.com
Sun Jan 19 19:53:50 CET 2014


Due to the changes in contextual/bidi validation, Windows’ client APIs dropped the additional context/bidi checks with IDNA2008 to forward-proof ourselves against other contextual/bidi rule changes.  As our APIs are intended for client side use, we expect that labels invalid by the rules would fail to be registered, so we’re depending on the existence of a DNS record to confirm whether that’s valid or not.  (Eg: we’re depending on the registrars to do the contextual/bidi validation when registering domains).

There are, of course, cons to that approach.  And it seems like we’d still pass James’ test of whether we’re IDNA2008 or IDNA2003 (because we did test this previously).

And, just to be clear, our APIs are currently IDNA2008 plus UTS46 (compatibility) minus contextual/bidi validation (and have been for a while and are unlikely to change in the foreseeable future).

-Shawn

From: idna-update-bounces at alvestrand.no [mailto:idna-update-bounces at alvestrand.no] On Behalf Of James Mitchell
Sent: Thursday, January 16, 2014 2.58 PM
To: Mark Davis ☕; Anne van Kesteren
Cc: Gervase Markham; yaojk; Paul Hoffman; PUBLIC-IRI at W3.ORG; uri at w3.org; John C Klensin; IDNA update work; www-tag.w3.org
Subject: Re: Standardizing on IDNA 2003 in the URL Standard

The BIDI rule changed in IDNA2008 to allow trailing digits in RTL labels – I use this for determining whether an implementation is based on IDNA2003 or IDNA2008 (+ UTS 46 or another set of mappings).

Regards,
James Mitchell
ARI Registry Services

From: Mark Davis ☕ <mark at macchiato.com<mailto:mark at macchiato.com>>
Date: Friday, 17 January 2014 12:24 am
To: Anne van Kesteren <annevk at annevk.nl<mailto:annevk at annevk.nl>>
Cc: Gervase Markham <gerv at mozilla.org<mailto:gerv at mozilla.org>>, yaojk <yaojk at cnnic.cn<mailto:yaojk at cnnic.cn>>, Paul Hoffman <paul.hoffman at vpnc.org<mailto:paul.hoffman at vpnc.org>>, "PUBLIC-IRI at W3.ORG<mailto:PUBLIC-IRI at W3.ORG>" <public-iri at w3.org<mailto:public-iri at w3.org>>, "uri at w3.org<mailto:uri at w3.org>" <uri at w3.org<mailto:uri at w3.org>>, John C Klensin <klensin at jck.com<mailto:klensin at jck.com>>, IDNA update work <idna-update at alvestrand.no<mailto:idna-update at alvestrand.no>>, "www-tag.w3.org" <www-tag at w3.org<mailto:www-tag at w3.org>>
Subject: Re: Standardizing on IDNA 2003 in the URL Standard

> The point is that in practice, it [IDNA2003] isn't fixed to Unicode 3.2.

It is not unlikely that an implementation that you think is following IDNA2003 (with a non-standard, larger repertoire) is actually following UTS 46.

If you were reverse-engineering to find out which standard an implementation was following, you'd need to query certain characters to see if they were supported, and how. UTS 46 also allows two 'modes', for transitional and not, that you'd have to test. There is a table in http://unicode.org/reports/tr46/#Table_IDNA_Comparisons that illustrates this. (You'd have to look at the data tables to get a full listing.) And, of course, it is clearly possible for an implementation to be non-conformant to all of the standards we are talking about (IDNA2003, UTS 46, and IDNA2008).

As previously noted, however, casing differences and the 4 deviation characters take some careful checking, since there is a difference between what the implementation accepts and what goes out 'over the wire'. And the implementation may also not be using the latest version of Unicode, which would make a difference for UTS 46 and IDNA2008.

BTW, there's an online demo of Unicode properties that can be used to see differences. The categories are slightly different than what is shown in the above chart, but you can get a sense for the differences:

http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{any}&abb=on&g=idna2003+uts46+idna2008<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p%7bany%7d&abb=on&g=idna2003+uts46+idna2008>

One way to look at UTS 46 is as a migration layer to support client implementations during the transition of registries from IDNA2003 to IDNA2008, plus a mapping layer that can be used with straight IDNA2008.

> I think I did mention earlier on UTS46 might be okay, depending on the
details. I am hoping to hear from Mark on the matter.

​I'm not sure what specific​ questions you have about UTS 46. Can you reiterate them?




Mark<https://google.com/+MarkDavis>

— Il meglio è l’inimico del bene —

On Thu, Jan 16, 2014 at 12:48 PM, Anne van Kesteren <annevk at annevk.nl<mailto:annevk at annevk.nl>> wrote:
On Thu, Jan 16, 2014 at 11:36 AM, Gervase Markham <gerv at mozilla.org<mailto:gerv at mozilla.org>> wrote:
> On 16/01/14 11:17, Anne van Kesteren wrote:
>> It's not worse if it's fully backwards compatible and mostly
>> interoperable across all major clients. At that point the standard is
>> just wrong.
>
> And having a standard fixed to Unicode 3.2 is not also "just wrong"?
The point is that in practice, it isn't fixed to Unicode 3.2. I have
yet to encounter an IDNA2003 implementation that does that. It turns
out the setup we have in practice is a compatible evolution.


> And I refer you to my comments above. Problems like lowercasing (for
> better or worse) are punted by IDNA2008 and are labelled as an
> application-level problem. In practice, what everyone should do for best
> interoperability is implement the same application-level mappings, and
> implement ones which are as compatible as possible with IDNA2003.
> Hence.... UTS46.
I think I did mention earlier on UTS46 might be okay, depending on the
details. I am hoping to hear from Mark on the matter.


--
http://annevankesteren.nl/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20140119/cea34316/attachment-0001.html>


More information about the Idna-update mailing list