Standardizing on IDNA 2003 in the URL Standard

Fri Jan 17 19:28:58 CET 2014

On 17 jan 2014, at 17:22, John C Klensin <klensin at jck.com> wrote:

> I agree with your main point, however: IDNA2008 was driven by
> two fundamental design decisions different from those underlying
> IDNA2003:
> 
> 	(i) Reversibility of the two label representations for
> 	the reasons you summarized.
> 	
> 	(ii) Shifting from normative tables tied to a version of
> 	Unicode (i.e., Stringprep/Nameprep) to a rule set
> 	intended to be largely independent of version changes. 

Let me say that I think the main problem for this discussion to move forward is that too many things are discussed at the same time. And that was another reason why IDNA2008 was developed to replace IDNA2003.

Let me try to give my "from the top of my head" perspective of the various issues. Others (Andrew, John, Mark) might add things of course:

1. Algorithmic definition of what status each Unicode Codepoint has

IDNA2003 is defined by an explicit list of code points, based on Unicode 3.2. Because of this, it can formally not be applied to other versions of Unicode. Sure, it is possible to try to guess what algorithm was behind the tables, and then apply those algorithms to later versions of Unicode, but that is not for certain.

In reality, that is exactly what IDNA2008 is. A set of rules that leads to as much backward compatibility as possible.

2. Mapping, like case folding, NFC etc

IDNA2003 did include some mapping. IDNA2008 does not, for various reasons.

Some people do have the view it is really important mapping is uniform across applications, operating systems and cultures. Some do think a subset of the mappings must be 1:1. Some think the best mappings are done with the help of a locale (that by definition is different for different users).

3. Backward compatibility

A few code points have changed status so that they are, when applying IDNA2008 algorithms, not backward compatible. For each such code point (character) some think it would be preferred to have the same management of them.

This should include both information on what to do with them, and how to one day phase out these special rules.

...and as I said, possibly more.

For me personally, I think the most important thing is (1). Having algorithms is good, having 1:1 between A-label and U-label is good. Separating 1, 2 and 3 above is good.

For 2, I think we will never get the same mappings in all contexts. But within a context (cultural and/or application) we might have the same.

For 3, I think we have not heard enough how for example .DE have taken care of the issue(s). Much more important to listen to them than for example .COM.

But I am pretty sure we need to keep the things separated to be able to move forward.

   Patrik

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20140117/5929ae4d/attachment-0001.pgp>