Protocol-08 (and status of Defs-04 and Rationale-06)

Kenneth Whistler kenw at sybase.com
Tue Dec 9 00:32:47 CET 2008


O.k., time for some renumbering and analysis here.

The logical possibilities for "forbidding at protocol level"
one or another set of the digits in question, either
singly or in contextual combinations are:

==============================================================

a. European digits (0030..0039) alone

b. Arabic-Indic digits (0660..0669) alone

c. Extended Arabic-Indic digits (06F0..06F9) alone

d. European + Arabic-Indic digits together

e. European + Extended Arabic-Indic digits together

f. Arabic-Indic + Extended Arabic-Indic digits together

g. European + Arabic-Indic + Extended Arabic-Indic digits together

================================================================

I think we can safely assume that (a) is off the table, for
legacy ASCII labels.

I think we can safely assume that (b) and (c) are off the table,
as well, since nobody, as far as I can recall here, has been
calling for an across-the-board prohibition (i.e. DISALLOWED
categorization) of Arab Arabic digits on their own, or
Perso-Arabic Arabic digits on their own.

That leaves us with (d)..(g), which are the *contextual*
prohibitions, to prevent mixing of certain combinations of
these digits together in a single label.

Now, returning to Mark's and Eric's numbering, the options
they had, restated are:

Mark #1:  Forbid (d), (e), (f), and (g). [Actually forbidding
          (g) would be corollary, if (d), (e), and (f) were
          forbidden.]
         
Mark #2:  Forbid (d) and (e) [and (g) by corollary]. Allow (f).
          (= Eric #5)

Mark #3:  No prohibitions by protocol. Handle by registry filter.

Mark #4:  Forbid (f) [and (g) by corollary]. Allow (d) and (e).

Eric #4a: Forbid (f) [and (g) by corollary] except for the
          Arabic four, five, six (because not confusable).
          Allow (d) and (e). [Actually "five" is confusable,
          but this will be moot -- see below.]
          
Alright, that is what has been proposed so far. *But* we now need
to take into account Harald's reminder that some combinations
are already disallowed separately by the bidi rules on label
well-formedness, quite independently of any consideration of
CONTEXTO categorization. What the bidi rules require of label
formation is:

Bidi:     Forbid (d) and (f) [and (g) by corollary]. Allow (e).

This changes the options entirely, in my assessment. If you
check carefully now, this takes Mark's #2 off the table.
It also takes Mark's #4 (and Eric's variant #4a) off the table,
as well.

What we really need to decide between is:

Mark #1:  Forbid (d), (e), (f), and (g).

Mark #3:  No prohibitions by protocol. Handle by registry filter.

And for Mark #1, since the bidi rules *already* forbid (d), (f),
and (g), operationally what this boils down to is deciding
whether to:

    Option alpha: Add Extended Arabic-Indic digits (06F0..06F9)
                  to CONTEXTO in tables.txt and add a context
                  rule in Appendix A prohibiting those from
                  cooccurring in a label with European digits
                  (0030..0039).
                  
    Option beta:  Not do option alpha.
    
Doing anything else, in my opinion, would over-engineer and
needlessly complicate the specification, with no net improvement
in the end result.

The advantage I see in choosing option alpha is that it
would add a symmetry to the handling of Arabic digits, making
the mixing of either set of them with European digits
prohibited in labels, irrespective of bidi arcana. That is
easier for implementers and end users to understand than
the somewhat odd conclusion that comes simply from application
of the bidi rules. I think option alpha is also closer to
what the (Arab) Arabic script input has been on the topic.

The advantage I see in choosing option beta is that it
keeps the tables document a little simpler, with one less
abstruse context rule to check for. I think option beta
is also closer to what the (Iranian) Arabic script input has been
on the topic, unless I have misunderstood what Alireza has
been saying.

--Ken



More information about the Idna-update mailing list