tables-06b.txt: A.5, A.6, A.9

Patrik Fältström patrik at frobbit.se
Fri Jul 24 17:56:09 CEST 2009


On 24 jul 2009, at 02.21, Kenneth Whistler wrote:

> ======================================================
>
> A.5. GREEN LOWER NUMERAL SIGN (KERAIA)
>
> First, if an annotation for U+0375 is to be used,
> this character is the "aristeri keraia" or "left keraia",
> as opposed to U+0374, which is the "dexia keraia",
> or "right keraia", the one more commonly used.
>
> This has a bearing on the Rule Set for A.5. The current
> rule set focusses on the wrong thing, in that it
> implements the notion that U+0375 can only occur in
> a label which is entirely in the Greek script. As for
> other cases, the issue of enforcing labels to be
> single script seems out of scope here and we have
> generally not focussed on that as an issue for the Tables
> document, in particular.
>
> The context for *appropriate* use of the left keraia is
> placed immediately to the left of a Greek letter, where it
> indicates the use of that letter as a thousands counter,
> instead of a units counter. Because of this, I think
> the appropriate Overview and Rule Set for A.5 is:
>
> Overview:
>   The script of the following character MUST be Greek.
>
> Rule Set:
>   False;
>   If Script(After(cp)) .eq. Greek Then True;

Ok.

> =========================================================
>
> A.6. COMBINING CYRILLIC TITLO
>
> This is the one character left in the CONTEXTO list which
> is clearly a mistake. It is the only combining character
> in the set of exceptions. There are other Cyrillic script-specific
> combining marks, including the very next character in the
> Unicode code chart, U+0484 COMBINING CYRILLIC PALATALIZATION,
> and there are dozens of other combining marks that are
> script-specific for various other scripts, all of which are
> simply PVALID, with no attempted constraint on ensuring by
> CONTEXTO rule that they only occur in labels with that same,
> single script.
>
> I believe the only reason the combining titlo got into the
> Exceptions list and categorized as CONTEXTO is because it
> was mentioned in passing early on when the Greek numeral
> signs were discussed. Why? Because one of the functions of
> a titlo in historic Cyrillic (but not its only function) is
> to mark a letter as having a numeric value. That, however,
> seems to have no rational connection to a need to make that
> single historic combining mark a formal exception in the
> Tables document, and to write a CONTEXTO rule for it.
>
> Note that in the last couple weeks, as people have been
> speaking up for retaining the CONTEXTO rules, justifications
> have been made for keeping the dots, the hyphen, the Arabic-Indic
> digits, the quote and apostrophe lookalikes, but *nobody* has
> explicitly defended retaining combining titlo in the exceptions
> list. For good reason: it is indefensible as an exception! ;-)
>
> Suggestion: Remove U+0484 from the 2.6 Exceptions list
> and delete Appendix A.6.

Done.

> ==========================================================
>
> A.9. KATAKANA MIDDLE DOT
>
> On this one, there is a long thread from April 3 - 7
> entitled "Tables and contextual rule for Katakana middle dot"
> that started with John Klensin's observation:
>
> <quote>
> Just so this doesn't accidentally fall through the cracks...
>
> It is clear from the discussion last week that I simply got the
> contextual rule for Katakana Middle Dot (U+30FB) wrong in what
> is rule/Appendix A.12 in Tables-05.  I had understood that I had
> been told it was used only with Katakana; the JET I-D and
> Monday's presentation make it clear to me (and I assume others)
> that it can be used between any pair of Japanese characters.
> The overview now reads:
>
>   Adjacent characters MUST be Katakana.
>
> It should be:
>
>   Adjacent characters MUST be Hiragana, Katakana, or Han.
>
> The associated Rule Set will, of course, have to be updated to
> match.
> </quote>
>
> I concur with that general assessment, although Yoneya-san
> noted that it the Katakana middle dot also occurs in
> other (Japanese) contexts, including before or after
> ([a-zA-Z0-9]). Yoneya-san's assessment was:
>
>    (KATAKANA MIDDLEDOT) MUST be used in Japanese context.
>
> And the thread then foundered and moved on to other topics,
> because nobody really knows how to specify that in a rule.
>
> At any rate I would like to reiterate that this should not fall  
> through
> the cracks, and the Overview and Rule Set for A.9. still need
> updating. The options are:
>
> 1. Update as John Klensin suggested.
>
> 2. Add ([a-zA-Z0-9]) to the allowed contexts, to get closer
>   to Japanese usage.
>
> 3. Give up on attempting to write a formal Rule Set for
>   "MUST be used in Japanese context", make the character
>   PVALID instead of CONTEXTO in the Exceptions list,
>   and leave it up to registrars to allow or disallow for
>   country-specific registrations.
>
> I don't think we have the option to leave the A.9. Rule Set
> as it is currently stated, as that is not even minimally
> acceptable in a Japanese context.

For now, I have chosen (1).

    Patrik

> ==========================================================
>
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20090724/48ab3cb8/attachment.pgp 


More information about the Idna-update mailing list