Another round of IDNAv2, and thoughts on IDNA2008 goals
John C Klensin
klensin at jck.com
Fri Mar 6 09:45:17 CET 2009
--On Wednesday, March 04, 2009 11:20 -0800 Paul Hoffman
<phoffman at imc.org> wrote:
> This leaves the primary difference between the two the major
> goal of IDNA2008: to create a framework that is independent of
> Unicode version, that is, that does not need to be changed in
> the future as the Unicode Consortium comes out with new
> versions of the Unicode Standard. IDNAv2 definitely doesn't
> try to achieve this goal, and I have added a few sentences
> about that in the introduction.
> The WG then gets to consider the tradeoff: is the lack of
> compatibility from IDNA2003 to IDNA2008 worth the goal of no
> future updates to IDNA2008? Is that goal even achievable? I
> would have hoped so, but I am far from convinced that the
> Unicode Consortium could continue to add characters without us
> needing to change some of the rules in the IDNA2008 tables.
While the sample is obviously very small, we have some
experience with the transition from 5.0 to 5.1, which did not
require changes of rules. Perhaps someone could run the rules
with whatever the current state of 5.2 is to see what could be
learned from that. I don't know whether going back to 4.0 or
even 3.2, running the rules, and comparing would tell us
anything useful or not: my highly subjective impression is that
Unicode character and property behavior is becoming more stable
wrt IDNA requirements over time. Consequently, even if there
were bad results on those early-version tests, I'm not quite
sure that we would know how to interpret them predictively.
> In preparing IDNAv2, I think that the amount of work that is
> needed might be much, much less than is still needed to
> carefully add appropriate mapping to IDNA2008. We keep hearing
> "we are almost done", and we keep finding language and scripts
> that need attention to deal with the changes in IDNA2008.
We may need to agree to disagree about this, but I don't believe
that statement is correct. We knew on the day the charter was
approved that Eszett, Final Sigma, and the joiner characters
were problematic. We knew that we had significant similarity
issues with Latin, Greek, and Cyrillic. We knew that Arabic
didn't need ZWNJ, that the issues with ZWJ and ZWNJ in the Indic
scripts were different from those with Persian. We knew that
there were issues with Korean soon after we got started even
though it took us much too long to adequately understand what
those issues were. We've known about the Turkish dotless i,
the Catalan middle dot, and the associated problems since before
IDNA2003 was completed. The digit issues with Arabic script got
onto our radar more recently, but at least some of them should
have been no real surprise to anyone who observed the
relationships between the Arabic-Indic digit set and the Eastern
Arabic-Indic digit set or who had spent enough time in Arabic
language countries where European and the local flavor of
Arabic-Indic digits are used more or less interchangably (and
not distingished at all when read out loud).
The problems and what to do about them involve complicated
tradeoffs, but our problems have mostly been with how to balance
those tradeoffs and with our seeming inability to reach
conclusions and then move on and not continually revisit them,
not with sudden surprises that require new kinds of rules or
> I do not pretend that IDNAv2 is wonderful. It has all of the
> cut corners and compromises that we made in IDNAv1. I would
> not be surprised if the WG decided that fixing those in
> IDNA2008 is the better way to go, even if it means introducing
> new problems, but I would also not be surprised if the WG said
> "let's just go with IDNAv2 now" and be willing to do IDNAv3 in
> a few years. If the WG goes with IDNAv2, I certainly don't
> think we should wait six years for IDNAv3; it should be much
> sooner than that.
That line of reasoning actually identifies an issue that helps
to further convince me that we should just finish IDNA2008.
There will always be transition issues, even if the only changes
are simple additions of characters. As Cary pointed out, making
new characters, or a new script available in a given zone causes
transition problems in practice, even within the bounds of
IDNA2003. What we've been told by the big, top-level (and
similar) registries is that they are willing to deal with up to
one more large transition as long as it is relatively soon (a
limit we may already be pushing) and that it is the last one.
So, if we adopt an "IDNAv2 now, IDNAv3 a few years thereafter"
plan, it implicitly says that every decision made in IDNAv1
(including those cut corners and compromises) is forever.
I think we need to look in the corners and at the compromises,
figure out what we would have done then if we knew what we know
now, and then make decisions about what is worth changing
because we can do better versus what we should preserve purely
for backward compatibility. That is exactly what we've been
trying to do with IDNA2008. And the closer the two approaches
get to converging, the more simply finishing that work on
IDNA2008 makes sense to me.
YMMD, of course.
More information about the Idna-update