New version, draft-faltstrom-idnabis-tables-02.txt, available

Wed Jun 13 03:50:54 CEST 2007

> http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-02.txt

While the rule structure has improved in this version, there are a number of
problems remaining. Rule H stands out as one of them.

> 2.8.  Rule H - Stable scripts

On 6/12/07, John C Klensin <klensin at jck.com> wrote:
I intensely dislike having Rule H.  I think that dislike is
shared by Patrik, Harald, Cary, Tina and others.  I also don't
think we have so far explained it, and the reasons for it, very
well, and I'd appreciate the help of others in coming up with a
better explanation.  But we have concluded, sadly and painfully,
that it is necessary, at least for the short term.

On 6/12/07, Harald Alvestrand <harald at alvestrand.no> wrote:
>
>
> - We (as in "the community", not "the editing team") have experienced
> that a number of scripts have issues that are not resolved, or not
> completely resolved, at this time.
> - For some scripts, we're pretty certain there are no issues - or,
> rather, that the community's settled down to a specific set of tradeoffs
> that are unlikely to change.
>

Rule H has no justification in the document; not only that, as Ken points
out, Latin, Greek, and Cyrillic are some of the *tougher* cases regarding
security, not the easy cases. If one were to pick the scripts to start with
in terms of reducing possible security problems, these would not actually be
the ones to start with.

I have said this before, but the whole way this process is being handled is
not what I am used to in good engineering design. If you have a set of
problems, and are proposing a number of steps that are to address that
problem, you should be able to state, for each of those steps, an example of
the problem that it solves and how it solves that problem. We just don't see
this in the document, nor on this list. They might be private discussions,
but I hadn't thought that was how the IETF was supposed to work...

No reason is given for the focus on only European scripts; and that focus
will surely raise suspicions in many circles. While I'm sure that the
restriction to European languages is just because those are the ones the
small group of authors is familiar with, it will not be received well. If
"we the community" have "experienced that a number of scripts have issues
that are not resolved", then those problems should be enumerated
*explicitly*, not hidden away.

The situation might be different if we were starting from zero; but we are
not. We already have an IDNA system that works for a great many people. And
while there are security problems with it, those are well known and vendors
are dealing with them. Moreover, of the problems that IDNAbis solves, they
are just the easy ones -- the harder ones are ones like the "paypay.com"
case, which the current suggestion for IDNAbis doesn't touch. So it feels
like we are looking at a proposal that

   1. doesn't actually help much with the practical problems that people
   face
   2. solves the easy problems, but not the hard ones; so people have to
   essentially do the work anyway
   3. and removes much of the functionality, except for some favored
   groups: Europe and the Americas

It feels a bit like some Federal agency's finding that there are some roads
without side rails. It decides that because of that security problem, we
need to forbid people from using any roads, except of course in New England
-- because we know what the roads are like there.

> 2.4.  Rule D - Ignorables

   property(cp) is in {Other_Default_Ignorable_Code_Point,
                       Noncharacter_Code_Point}

Noncharacter_Code_Point is never in {Ll, Lu, Lo, Lm, Mn, Mc, Nd}, so this
addition is not necessary, any more than many other properties that are also
definitionally never in the set (controls, etc.)

> 3.  Calculation of the derived property

The rules A-G look fine - if we go back to my message of 12/14/06, we see
that they match what was there. (I added the correspondence to tables-02
rules in [..] below)

0. Start with the empty set. For each code point cp from 0 to 0x10FFFF:
[A] 1. If generalCategory(cp) is in {Ll, Lu, Lo, Lm, Mn, Mc, Nd}, add cp
[B] 2. If NFKC(cp) != cp, remove cp
[C] 3. If casefold(cp) != cp, remove cp
[D] 4. If defaultIgnorableCodePoint(cp), remove cp
[E] 5. If script(cp) in {Xsux, Ugar, Xpeo, Goth, Ital, Cprt, Linb, Phnx,
Khar, Phag, Glag, Shaw, Dsrt, Runr}, remove cp
[F] 6. If block(cp) in {Combining_Diacritical_Marks_for_Symbols,
Musical_Symbols, Ancient_Greek_Musical_Notation}, remove cp
[G] N. If cp is in [-A-Z0-9], add cp

The numbers 1-6,N correspond to A-G in
draft-faltstrom-idnabis-tables-02.txt(with the exception of the change
in D as noted above).

While written functionally, this is simply an expression of forming a set by
a set of boolean operations. It is simple, because we start with one set,
then remove items. At the very end we add back the grandfathered ASCII. When
we expanded the values to be Always, Maybe, and Never, it basically had the
effect of rewriting to the following (notice also the reversal of A/1 and
G/N to be more in line with what is in Patrik's document);

0. Start with the empty set. For each code point cp from 0 to 0x10FFFF:

Grandfathered
[G] N. If cp is in [-A-Z0-9], put cp in Always

Functional Exclusions
[B] 2. Else if NFKC(cp) != cp, put cp in Never and stop
[C] 3. Else if casefold(cp) != cp, put cp in Never and stop
[D] 4. Else if defaultIgnorableCodePoint(cp), put cp in Never and stop

Usage Exclusions
[E] 5. Else if script(cp) in {Xsux, Ugar, Xpeo, Goth, Ital, Cprt, Linb,
Phnx, Khar, Phag, Glag, Shaw, Dsrt, Runr}, put cp in Maybe and stop
[F] 6. Else if block(cp) in {Combining_Diacritical_Marks _for_Symbols,
Musical_Symbols, Ancient_Greek_Musical_Notation}, put cp in Maybe and stop

LMN Inclusion
[A] 1. Else if generalCategory(cp) is in {Ll, Lu, Lo, Lm, Mn, Mc, Nd}, put
cp in Always and stop

Exclude everything else
     Z. Else put cp in Never

But 3. "Calculation of the derived property". that section is very hard to
make out. Moreover, it is impossible to assess what it is supposed to be
doing until the difference between Maybe Yes and Maybe No is completely
spelled out operationally, and the goal is made clear.

And of what I can make out, it looks unpleasant. Many characters are not
subject to conditions B-D, which should put them into the Never category.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070612/04da0c4a/attachment-0001.html