tables-06b.txt: A.5, A.6, A.9

Kenneth Whistler kenw at sybase.com
Fri Jul 24 02:21:40 CEST 2009


Patrik,

First, my apologies for continuing to comment so close
to the Stockholm meeting, but I wanted to make sure
that I closed out the input I had regarding the
other three CONTEXTO rules that I believe still have
open issues.

I'll make this a little briefer than my last notes. I hope the
justifications will be clear enough.

--Ken

======================================================

A.5. GREEN LOWER NUMERAL SIGN (KERAIA)

First, if an annotation for U+0375 is to be used,
this character is the "aristeri keraia" or "left keraia",
as opposed to U+0374, which is the "dexia keraia",
or "right keraia", the one more commonly used.

This has a bearing on the Rule Set for A.5. The current
rule set focusses on the wrong thing, in that it
implements the notion that U+0375 can only occur in
a label which is entirely in the Greek script. As for
other cases, the issue of enforcing labels to be
single script seems out of scope here and we have
generally not focussed on that as an issue for the Tables
document, in particular.

The context for *appropriate* use of the left keraia is
placed immediately to the left of a Greek letter, where it
indicates the use of that letter as a thousands counter,
instead of a units counter. Because of this, I think
the appropriate Overview and Rule Set for A.5 is:

Overview:
   The script of the following character MUST be Greek.
   
Rule Set:
   False;
   If Script(After(cp)) .eq. Greek Then True;
   
=========================================================

A.6. COMBINING CYRILLIC TITLO

This is the one character left in the CONTEXTO list which
is clearly a mistake. It is the only combining character
in the set of exceptions. There are other Cyrillic script-specific
combining marks, including the very next character in the
Unicode code chart, U+0484 COMBINING CYRILLIC PALATALIZATION,
and there are dozens of other combining marks that are
script-specific for various other scripts, all of which are
simply PVALID, with no attempted constraint on ensuring by
CONTEXTO rule that they only occur in labels with that same,
single script.

I believe the only reason the combining titlo got into the
Exceptions list and categorized as CONTEXTO is because it
was mentioned in passing early on when the Greek numeral
signs were discussed. Why? Because one of the functions of
a titlo in historic Cyrillic (but not its only function) is
to mark a letter as having a numeric value. That, however,
seems to have no rational connection to a need to make that
single historic combining mark a formal exception in the
Tables document, and to write a CONTEXTO rule for it.

Note that in the last couple weeks, as people have been
speaking up for retaining the CONTEXTO rules, justifications
have been made for keeping the dots, the hyphen, the Arabic-Indic
digits, the quote and apostrophe lookalikes, but *nobody* has
explicitly defended retaining combining titlo in the exceptions
list. For good reason: it is indefensible as an exception! ;-)

Suggestion: Remove U+0484 from the 2.6 Exceptions list
and delete Appendix A.6.

==========================================================

A.9. KATAKANA MIDDLE DOT

On this one, there is a long thread from April 3 - 7
entitled "Tables and contextual rule for Katakana middle dot"
that started with John Klensin's observation:

<quote>
Just so this doesn't accidentally fall through the cracks...

It is clear from the discussion last week that I simply got the
contextual rule for Katakana Middle Dot (U+30FB) wrong in what
is rule/Appendix A.12 in Tables-05.  I had understood that I had
been told it was used only with Katakana; the JET I-D and
Monday's presentation make it clear to me (and I assume others)
that it can be used between any pair of Japanese characters.
The overview now reads:

   Adjacent characters MUST be Katakana.

It should be:

   Adjacent characters MUST be Hiragana, Katakana, or Han.

The associated Rule Set will, of course, have to be updated to
match.
</quote>

I concur with that general assessment, although Yoneya-san
noted that it the Katakana middle dot also occurs in
other (Japanese) contexts, including before or after
([a-zA-Z0-9]). Yoneya-san's assessment was:

    (KATAKANA MIDDLEDOT) MUST be used in Japanese context.

And the thread then foundered and moved on to other topics,
because nobody really knows how to specify that in a rule.

At any rate I would like to reiterate that this should not fall through
the cracks, and the Overview and Rule Set for A.9. still need
updating. The options are:

1. Update as John Klensin suggested.

2. Add ([a-zA-Z0-9]) to the allowed contexts, to get closer
   to Japanese usage.
   
3. Give up on attempting to write a formal Rule Set for
   "MUST be used in Japanese context", make the character
   PVALID instead of CONTEXTO in the Exceptions list,
   and leave it up to registrars to allow or disallow for
   country-specific registrations.
   
I don't think we have the option to leave the A.9. Rule Set
as it is currently stated, as that is not even minimally
acceptable in a Japanese context.

==========================================================




More information about the Idna-update mailing list