New version, draft-faltstrom-idnabis-tables-02.txt, available

Thu Jun 14 01:06:51 CEST 2007

Harald,

> I can't tell the difference between an algorithm specified in terms of 
> selection criteria and an algorithm specified in terms of set operations.
> 
> Either they achieve the same result, or they don't.

You are missing my point.

If you are creating a *data* specification that consists of
some list of characters, then effectively you are (or should be)
doing a specification of a set.

http://en.wikipedia.org/wiki/Naive_set_thery#Specifying_sets

In this particular case, the domain is the set of Unicode
code points -- let's call it U.

The what we are specifying, for each value in the data table,
some subset:

{x : x in U AND P(x) AND Q(x) AND R(x) AND...}

That is *not* an algorithm. It is a set description.

An algorithm would be involved if you wanted a computer to
turn that set description into an explicit list, i.e., what
is done for Section 4.1:

{-, A, B, C, ...}

But I don't *care* about the details of that algorithm, nor
should you, nor should this document try to spell it (or them)
out, because that isn't the point of the *data* specification.

> That's a basic difference in approach that the two of us have disagreed on 
> before.
> If you specify the algorithm, and allow discussion of the algorithm, new 
> data can be evaluated according to the algorithm, or it can be shown that 
> the algorithm is incomplete, which may lead to a reevaluation of data 
> generated erroneously earlier (which is a problem for stability, of course. 
> You don't get something for nothing.)

The *algorithm* is irrelevant, except for Patrik (for *his* algorithm)
to verify that he didn't goof up in evaluating the set. (In fact
he did -- see my earlier long analysis of the details of the table --
but that is neither here nor there at the moment.)

What *is* relevant is the collection of criteria for the set
description: the P(x), Q(x), etc., listed above.

Reworded:

If you specify the criteria for the set description, and allow discussion
of the criteria for the set description, new characters can be
evaluated (as being in or outside the set) according to those
criteria, and new data regarding the appropriateness of the criteria
can be evaluated. Determination that some characters should or
should not be in the set can lead to the definition of new criteria
for inclusion in the set. If you publish an Internet Standard with
one set of criteria, and then later change your mind and change
the criteria for inclusion in the set, that is a problem for stability,
of course. You don't get something for nothing.

> The tables, unsupported by the algorithm that generated them,

...unsupported by a clear statement of the criteria used for
inclusion or exclusion of characters,

> have no way 
> in which people can distinguish a systematic application of rational rules 
> from "we just felt like picking this value" - an accusation that the UTC is 
> far more familiar with than it wants to be, I think.

I have no problem whatsoever with a set definition being done
with a set of "rational rules". In fact the document has made
great strides towards clarity in Rules A-G. And, contrary
to what Mark just said, I don't disagree at all with spelling
out the criteria quite explicitly. Or, in Mark's words:

> I think having the method used to
> derive the property in the document is fine.

Just don't call this an algorithm, and don't write the document
as if it were documenting an algorithm. It muddies the
whole issue.

I think you've been mixing levels here (as well as terminology).
What I think is needed are:

1. Logical criteria for set specification.

That is what belongs in the document. Evaluation of those
logical criteria results in the equivalent representation of
the set as a list (in Section 4.1). Yes, you need an algorithm
to actually do such evaluations, but I don't *care* about the
details, nor do I expect them to be spelled out in this
data specification. It would be as irrelevant as spelling
out the details of how to verify that {x : x in ASCII AND alphabetic(x)}
evaluates to {A,B,C,...Z,a,b,c,...z}.

2. An evaluation matrix for the criteria themselves.

This is what the working group needs. It is the means by
which you can have a rational argument about whether Rule H
(for example) belongs as part of the set specification,
other than "we just felt like picking this value".

3. Criteria for allowance of change of the logical criteria
for the set specification.

This is what all the handwaving about MAYBE and stability is
about. If the logical criteria for the set specification
are going to change in the future, then there needs to be
some determination of what could justify such changes,
and which values could change and which characters could
change and which impacts on the consuming algorithm
(IDNAbis' determination of what Unicode strings are
valid U-labels and for matching against registered domain
names) are allowed and which are not. The document takes
a small step towards that in implying that ALWAYS and NEVER
values cannot be changed, but everything else is vague.

4. An evaluation matrix for the criteria for allowance
of change of the logical criteria for the set specification.

This is what the working group needs. It is the means by
which you can have a rational argument about whether,
for example, the criteria (and the corresponding table)
can be changed in the future in such a way that an IDN
valid using Version n of the table can become invalid
using Version n+1 of the table -- or not.

--Ken