Tables: Context Rules

Mark Davis mark at macchiato.com
Wed Nov 19 17:22:04 CET 2008


*Context Rules*


Still needs a lot of work, and problems noted in
http://www.alvestrand.no/pipermail/idna-update/2008-November/002964.html
 haven't been done.
Other items:


*Location*


*Since this will not be part of the final document: the text will be moved
to the IANA registry and be maintained there -- there needs to be a note to
the readers and editor to that effect at the top of the section. There
should also be an ed note there (in John's style) indicating that the
following rules still require much work.*


*Pseudocode*

*There should be some explanation of the syntax and functions, even if not
precise. The syntax needs to be a bit more extended to be useful.*


*I'd suggest defining P to be the current position of the character being
tested, F to be the position of the first character, and L to be the
position of the last character. Then we don't need constructs such as
LastChar, and can be more expressive, because we have to be able to look at
more than one character before/after; eg we can then use
Script(Character[P-2]) to get the script of the previous to last character.
(Note: I include F just so we don't have to decide between zero-based or
one-based, but it would be even simpler to do zero-based.)*


*I'd also prefer just using = instead of .eq., but that's just a preference.
*


*The rules need to be carefully reviewed for clarity and consistency with
the text (and vice versa). For example, even for a simple case like Garesh
there are many problems.*


Overview:
The script of the preceding character and the subsequent character, if
any, MUST be Hebrew.//
The scope of "if any" must be clear. Is it to apply to both the preceding
and subsequent, or just the subsequent?
// And it must not require the second, because it can be final in a word,
which means it is fine to follow with "-" or other non-Hebrew.

Rule Set:
If FirstChar .eq. True then False;
Else If BeforeScript .eq. Hebrew Then
    If AfterScript .eq. Hebrew Then True;
    Else False;

// This is missing a trailing Else (made clear by my block indentation)
// While it shouldn't require an AfterScript, even the syntax is
ill-defined:
//    What is the value of AfterScript if there is no character after? There
is no check to make sure that it isn't LastChar.


*9. HYPHEN-MINUS*
Overview: Must appear at the beginning or end of a label.
...
Rule Set:
If FirstChar .eq. True Then False;
If LastChar .eq. Then False;
Else True;
=>
Overview: Must appear neither at the beginning nor at the end of a label,
and must not be in both the third and fourth positions in the string.
Rule Set:
If P = F OR P = L Then False;
Else if P = F+2 And Character[P+1] = "-" Then False;
Else if P = F+3 And Character[P-1] = "-" Then False;
Else True;

Hyphen-Minus is quite unlike the rest of the rules in that we can NEVER have
the above 3 conditions changed. We should just remove it from the CONTEXTO
rules, since the conditions for its use are in Protocol as a separate
condition (Hyphen - P4.3.2.1, although this needs fleshing out, see previous
note) from the CONTEXT conditions (P4.3.2.3).


*10. ZERO WIDTH NON-JOINER*
For the rule sets I suggest the following. Rationale: As long as it is
pseudocode -- it is made up for this purpose and matches no real programming
language -- we should use a pseudocode that actually works to give the same
meaning as the prose. And the conditions needed to be tighter, as per
http://unicode.org/reports/tr31/#Layout_and_Format_Control_Characters
===

The
script must be one in which the use of this character causes
significant visual transformation of one or both of the adjacent
characters.
=>
The script must be one in which the use of this character causes
visual transformation of one or both of the adjacent characters that are
required for significant semantic distinctions in at least some cases. This
includes ZWNJ after certain Virama characters, and between particular
joining characters in cursive scripts like Arabic.
[[anchor9a: The script list for this character is _not_ complete and,
in particular, more Indic scripts certainly need to be listed.]]

RuleSet

If BeforeScript .eq. ( Deva | Tamil |... ) Then
  If P = F OR P = L Then False;
  Else if Canonical_Combining_Class(Character[P-1]) != Virama Then False;
  Else if Not IsLetter(Character[P-2]) Then False;
  Else if Not ScriptCount(Character[P-2] + Character[P-1]) > 1 Then False;
  Else False;
Else if BeforeScript != Arabic Then False;
Else if Not MatchesBefore([[:jt=D:][:jt=L:]][:jt=T:]*) Then False;
Else if Not MatchesAfter([:jt=T:]*[[:jt=D:][:jt=R:]]) Then False;
Else True;

For
more information see Section 2.3 Layout and Format Control Characters
in [UAX31].


*11. ZERO WIDTH JOINER
*
The
script must be one in which the use of this character causes
significant visual transformation of one or both of the adjacent
characters.
=>
The script must be one in which the use of this character causes
visual transformation of one or both of the adjacent characters that are
required for significant semantic distinctions in at least some cases. This
includes ZWNJ after certain Virama characters, and between particular
joining characters in cursive scripts like Arabic.
[[anchor9a: The script list for this character is _not_ complete and,
in particular, more Indic scripts certainly need to be listed.]]

RuleSet
If BeforeScript .eq. ( Deva | Tamil |... ) Then
  If P = F OR P = L Then False;
  Else if Canonical_Combining_Class(Character[P-1]) != Virama Then False;
  Else if Not IsLetter(Character[P-2]) Then False;
  Else if Not ScriptCount(Character[P-2] + Character[P-1]) > 1 Then False;
  Else False;
Else False;


*14. MODIFIER LETTER PRIME *

Add a description: also used in Cyrillic transcription, where it must be
after a consonant.

BeforeScript If .eq. Greek Then
...
=>
If IsLetter(Character[-1]) And BeforeScript = Cyrillic Then True;
...
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20081119/b5881b9f/attachment-0001.htm 


More information about the Idna-update mailing list