Table-building

Thu Feb 1 01:24:36 CET 2007

Harald wrote:

> --On 14. desember 2006 18:24 -0800 Kenneth Whistler <kenw at sybase.com> wrote:
> 
> > The *table* itself should unambiguously be defined as
> > the list of characters appropriate for inclusion in
> > IDNA. IDNAInclusion.txt (or whatever name you like).
> 
> And how often do you believe this table would change?
> 
> Once a month?
> Once a year?
> Once a decade?
> 
> I think I disagree violently with you, but that is because I understand you 
> as saying that the table would change "once a decade".

Well, that understanding is wrong, I am afraid. I neither said
that nor implied it.

My objection to the way Patrik was constructing the table is
that by making it multi-state, the table itself is more
complicated, more difficult to implement, and its status becomes
more ambiguous and problematical for people attempting to
understand and implement it.

The statement of the IDNA nameprep, however it gets worked out
in detail, is going to need an inclusion table. Both the
statement of the algorithm and the implementations of it
are easier, if the table is simply constructed as a binary
property representation -- rather than trying to build anticipation
of future decisions that *might* be made about some characters
(but we don't know for sure yet which ones) being added into the 
table. That just makes for head-scratching in implementation.

> If it changes once a 
> month, the "unambiguous" table is a thin illusion papered over a tri-state 
> model.

First, it isn't going to change once a month. You know that,
so I don't see the point in raising it as a red herring.

Mark has provided the relevant timing history regarding changes
to the *repertoire* of Unicode versions which could, in principle,
impact the list of characters appropriate for the IDNA inclusion
table.

Second, I thought one of the points here was to get the IDNA
nameprep spec out of the business of having to be updated
every time the UTC and WG2 add some characters to Unicode
and ISO/IEC 10646. You accomplish that by publishing a specification
that defines the inclusion table by reference to a specific
character property for that purpose -- as we have discussed
at some length now.

I just pushed up IDNPermitted.txt to demonstrate what the
documentation for such a binary character property could (and
probably would) look like, if published as part of the Unicode
Character Database. The property is then easy to refer to
and easy to implement.

The eventual RFC for IDNAbis, rather than including some
long table definition that has to be maintained by
periodic revsions, can say something more or less like 
the following, in toto:

   The inclusion table referred to in Step X of nameprep
   is defined as all Unicode characters having the
   property IDN_Permitted, as defined by the Unicode
   Character Database. [UCD]

   Note: Some characters may be added to the repertoire
   of characters with the IDN_Permitted property in the
   future, as additional characters are added to the
   Unicode Standard. This would be the case, for example,
   when additional minority scripts are added to the
   standard. However, the maintenance of the IDN_Permitted
   property is bound by the stability guarantee that
   once a character is assigned that property, the property
   can never be removed from the character. In other
   words, the inclusion table may grow, but once a
   character is in the table, it can never be removed.

Or words to that effect. Wordsmith as required, but basically
that is all that the specification would need. You get
your stability, your flexibility, and your definition by
reference, and you never have to go back and revise and
version the RFC for IDNA nameprep again to deal with
Unicode versioning.

Now you or other people in the IETF may not believe in a
stability guarantee. But that it a matter of trust,
personalities, policies and procedures. We'll just have
to get on with working on those issues, I guess. 

The fundamental issue seems to be that some in the IETF are very
uncomfortable dealing with a character encoding standard
like the Unicode Standard (and ISO/IEC 10646) that keeps
changing and expanding over time -- and doesn't stay
conveniently pinned down like ISO 8859-1 has done.
That is, however, an inconvenient but unavoidable fact of
life. Unicode is here, it is the backbone of systems and
the internet now, and it *isn't* going to stop changing
for at least another decade yet. For everyone digging in
their heels and trying to prevent it from changing right
now, there is another community out there desperately trying
to ensure that *their* characters get added to the universal
character set before the people trying to freeze it get their
way.

And from my point of view, trying to encapsulate and
control that unease about change in the RFC for
IDNAbis, with a multi-state table that worries about
defining the set of "pending" characters, is just a
diversion from coming to closure on a working specification
for IDNAbis.

--Ken

> 
>                Harald
> 
> 
>