Charter changes and a possible new

Thu Jan 15 12:29:09 CET 2009

Hi.

I'm going to comment only very selectively on this thread for
several reasons; please do not construe lack of comments for
lack of conviction.  Among the reasons, in no particular order,
are:

* For reasons Patrik has outlined and a few more, I believe that
starting over with an attempt to define a new charter and a new
base document would set us back at least another year and
probably longer, regardless of the final content of that charter.

* As a corollary to that point, I do not believe that any of us
have unlimited time to invest in IDNs.  I would much prefer to
see that time spent -- and to spend my time-- moving forward,
rather than on rehashing settled issues (or questioning every
phrase as to whether it is settled).

* Also for reasons that Patrik has outlined and some others, I
believe we are actually very close to finished.  I've got new
versions of the three documents I'm editing nearly ready to go
and, while we can certainly quibble about details forever (with
either this set of documents and principles or another one), I
believe there are only a very small number of topics that need
significantly more work (one is finalization of several of the
contextual rules).

* There is one key issue on which I'm going to spend more time,
because it is important and Patrik has not explicitly addressed
it in detail, even though I considered it settled prior to
Paul's note. 

It is clear from considerable community and expert input that
ZWNJ and ZWJ are very significant for several of the Indic
scripts and that ZWNJ is very significant for at least several
of the languages that are common from Iran toward the east (not
that those languages are isolated to those areas).
"Significant", in this case means that, if people use words in
those languages to create labels, those words are keyed in
differently, look visually different, and often have different
pronunciations depending on whether ZWJ or ZWNJ would become
part of the Unicode coding (I state things that way because
there are, at least theoretically, different ways to code the
character strings involved... but neither the current documents
nor Paul's proposal involve reopening the question of the
appropriateness of Unicode).  

>From the standpoint of the communities involved, having the IETF
tell them that two strings are equal that centuries of
knowledge, experience, and common sense tell them are not equal
is an invitation to serious confusion issues, compromises the
integrity of identifiers, and, fwiw, is the height of arrogance.
IDNA2003, quite accidentally, does exactly that.   However, if
we are going to permit either ZWJ or ZWNJ as real characters
(i.e. and avoiding IDNA2008 terminology, they are mapped into
Punycode-encoded strings and reappear when the mappings are
reversed), then two things follow immediately: (i) we introduce
an incompatibility in which a string that can be treated as a
label under IDNA2003 and IDNAbis maps to a different string with
the former than with the latter and (ii) we either introduce a
requirement for some variation on contextual rules or we
introduce the very dangerous problem of invisible characters in
strings that do not change appearance when those characters are
embedded in them.

So the simple matter of introducing ZWJ and/or ZWNJ, which seem
vital for at least some of the scripts involved, eliminates
strict, input-string, compatibility with IDNA2003 and forces
contextual rules or some equivalent mechanism.   Both
consequences were anticipated at the time the charter was
approved.  So, if they are what makes the IDNA2008 model "too
complex" or "too complicated", then I think we are stuck with it
in _any_ IDNA revision, unless the IETF proposes to tell the
relevant language communities that we don't really care whether
they can write a reasonable range of sensible and predictable
mnemonics in their scripts and based on their languages.   

Once we establish the mechanisms for contextual rules and accept
the conclusion that there are going to be some incompatibilities
from IDNA2003 if one looks at input strings rather than what can
be stored in a zone file (i.e., what comes back from a Punycode
encoding), there is a real question about how far to push those
conceptual changes.  Would we reduce complexity considerably by
allowing ZWJ and ZWNJ but not Geresh and Gershayim?  Does using
a contextual rule to express the restriction on Hyphen-Minus
complicate things or would we be better off writing the
specification with a separate kind of rule for it, as the
Hostname specification does?   Would it be significantly less
incompatible if we tried to draw a line between "fix ZWNJ" and
"fix Sharp-S and final Sigma so that they can be coded?"   Those
are all reasonable questions, but I have a great deal of
difficulty concluding that they are as fundamental as the one
about whether the new mechanism or the input-string
incompatibilities are to be permitted at all.

I haven't attended nearly as many of the relevant meetings as
Patrik has, but I share his impression that telling the
communities involved that they cannot use their scripts properly
to represent their mneumonics in a clear and consistent way, and
cannot do so because of decisions we made with IDNA2003
--without consulting them directly and without thinking much
about it at the time--  would be a really bad idea.  Worse than
just a bad idea, it would be an abdication of our responsibility
to the global Internet community.

* For better or worse, Paul and I have a history of different
styles of approaches to problems and problem-solving.  The
differences tend to irritate both of us and get in the way of
understanding, even in areas where we basically agree on the
substance of the issue (which does, periodically, happen).  To
the degree possible, I don't want that history to confuse these
discussions; I'm sure he doesn't either.

Just my opinion, of course.
   john