Table-building

Mark Davis mark.davis at icu-project.org
Thu Feb 1 03:01:21 CET 2007


I think this does expose an issue that needs discussion. There are two types
of stability that could be guaranteed.

1. Once a character is encoded, the property value (true or false) MUST
never change.
2. Once a character is given the property value of true, its value MUST
never change to false. An encoded character SHOULD not change from false to
true, unless a strong case can be made for it.

For both of them we have the key requirement for stability, that once a
string qualifies as being valid, it stays valid forever.

However, #1 might be a bit too restrictive. If we currently say that
character X has the value false, but there is an issue if for some reason we
find out that that character is needed for some orthography of a language
in, say, the Congo. People who think that #2 is not sufficient might present
some scenarios where it could cause a problem (I can't think of any myself).

Mark

On 1/31/07, Kenneth Whistler <kenw at sybase.com> wrote:
>
> Harald wrote:
>
> > --On 14. desember 2006 18:24 -0800 Kenneth Whistler <kenw at sybase.com>
> wrote:
> >
> > > The *table* itself should unambiguously be defined as
> > > the list of characters appropriate for inclusion in
> > > IDNA. IDNAInclusion.txt (or whatever name you like).
> >
> > And how often do you believe this table would change?
> >
> > Once a month?
> > Once a year?
> > Once a decade?
> >
> > I think I disagree violently with you, but that is because I understand
> you
> > as saying that the table would change "once a decade".
>
> Well, that understanding is wrong, I am afraid. I neither said
> that nor implied it.
>
> My objection to the way Patrik was constructing the table is
> that by making it multi-state, the table itself is more
> complicated, more difficult to implement, and its status becomes
> more ambiguous and problematical for people attempting to
> understand and implement it.
>
> The statement of the IDNA nameprep, however it gets worked out
> in detail, is going to need an inclusion table. Both the
> statement of the algorithm and the implementations of it
> are easier, if the table is simply constructed as a binary
> property representation -- rather than trying to build anticipation
> of future decisions that *might* be made about some characters
> (but we don't know for sure yet which ones) being added into the
> table. That just makes for head-scratching in implementation.
>
> > If it changes once a
> > month, the "unambiguous" table is a thin illusion papered over a
> tri-state
> > model.
>
> First, it isn't going to change once a month. You know that,
> so I don't see the point in raising it as a red herring.
>
> Mark has provided the relevant timing history regarding changes
> to the *repertoire* of Unicode versions which could, in principle,
> impact the list of characters appropriate for the IDNA inclusion
> table.
>
> Second, I thought one of the points here was to get the IDNA
> nameprep spec out of the business of having to be updated
> every time the UTC and WG2 add some characters to Unicode
> and ISO/IEC 10646. You accomplish that by publishing a specification
> that defines the inclusion table by reference to a specific
> character property for that purpose -- as we have discussed
> at some length now.
>
> I just pushed up IDNPermitted.txt to demonstrate what the
> documentation for such a binary character property could (and
> probably would) look like, if published as part of the Unicode
> Character Database. The property is then easy to refer to
> and easy to implement.
>
> The eventual RFC for IDNAbis, rather than including some
> long table definition that has to be maintained by
> periodic revsions, can say something more or less like
> the following, in toto:
>
>
>    The inclusion table referred to in Step X of nameprep
>    is defined as all Unicode characters having the
>    property IDN_Permitted, as defined by the Unicode
>    Character Database. [UCD]
>
>    Note: Some characters may be added to the repertoire
>    of characters with the IDN_Permitted property in the
>    future, as additional characters are added to the
>    Unicode Standard. This would be the case, for example,
>    when additional minority scripts are added to the
>    standard. However, the maintenance of the IDN_Permitted
>    property is bound by the stability guarantee that
>    once a character is assigned that property, the property
>    can never be removed from the character. In other
>    words, the inclusion table may grow, but once a
>    character is in the table, it can never be removed.
>
> Or words to that effect. Wordsmith as required, but basically
> that is all that the specification would need. You get
> your stability, your flexibility, and your definition by
> reference, and you never have to go back and revise and
> version the RFC for IDNA nameprep again to deal with
> Unicode versioning.
>
> Now you or other people in the IETF may not believe in a
> stability guarantee. But that it a matter of trust,
> personalities, policies and procedures. We'll just have
> to get on with working on those issues, I guess.
>
> The fundamental issue seems to be that some in the IETF are very
> uncomfortable dealing with a character encoding standard
> like the Unicode Standard (and ISO/IEC 10646) that keeps
> changing and expanding over time -- and doesn't stay
> conveniently pinned down like ISO 8859-1 has done.
> That is, however, an inconvenient but unavoidable fact of
> life. Unicode is here, it is the backbone of systems and
> the internet now, and it *isn't* going to stop changing
> for at least another decade yet. For everyone digging in
> their heels and trying to prevent it from changing right
> now, there is another community out there desperately trying
> to ensure that *their* characters get added to the universal
> character set before the people trying to freeze it get their
> way.
>
> And from my point of view, trying to encapsulate and
> control that unease about change in the RFC for
> IDNAbis, with a multi-state table that worries about
> defining the set of "pending" characters, is just a
> diversion from coming to closure on a working specification
> for IDNAbis.
>
> --Ken
>
> >
> >                Harald
> >
> >
> >
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070131/3497cabd/attachment-0001.html


More information about the Idna-update mailing list