tables-06b.txt Pseudo-code clarification

Patrik Fältström patrik at frobbit.se
Fri Jul 24 17:47:40 CEST 2009


I used this.

    paf

On 22 jul 2009, at 03.46, Kenneth Whistler wrote:

> Patrik,
>
> Before delving into the syntax issues remaining for
> the rule sets for A.5, A.6, A.8, and A.9, I want to
> back up and consider the pseudo-code conventions
> described at the top of Appendix A. Part of the
> concerns raised about the syntax for the Hebrew
> gershayim results, I think, from the fact that the
> pseudo-code conventions aren't as clear as they
> could be.
>
> In particular, the meaning of the constructs
> Before(cp) and After(cp) are unclear enough that they
> will likely lead to misunderstandings and inconsistent
> attempts at implementation of the rules.
>
> I suggest that this be addressed by rewriting the
> paragraph which explains the pseudo-code conventions.
> And among other things, for a set of conventions like
> these, breaking them out from a paragraph form into
> bullet-like sections will also make it easier for people
> to read and understand them.
>
> In a set of rules like this, I think conciseness is less
> important than clarity, so my suggested rewrite will be
> a little more long-winded than the current draft, but I
> hope much clearer in the long run.
>
> Also, because of the way this pseudo-code is trying to
> mix property functions and string position functions,
> I think it is important to introduce an explicit
> "Undefined" term which can be then used consistently
> to deal with invalid string positions or property functions
> involving invalid codepoints.
>
> So here is my attempt at a rewrite for clarity. I'm not
> trying to change the intent of any of this pseudo-code,
> as developed to express the rule sets for CONTEXTO --
> just to make it clearer and more rigorous.
>
> --Ken
>
> =========================================================
>
> The grammatical rules are expressed in pseudo code. The
> conventions used for that pseudo code are explained here.
>
> Each rule is constructed as a Boolean expression that
> evaluates to either True or False. A simple "True;" or
> "False;" rule sets the default result value for the rule set.
> Subsequent conditional rules that evaluate to True or
> False may re-set the result value.
>
> A special value "Undefined" is used to deal with any
> error conditions, such as an attempt to test a character
> before the start of a label or after the end of a label.
> If any term of a rule evaluates to Undefined, further
> evaluation of the rule immediately terminates, as the
> result value of the rule will itself be Undefined.
>
> cp represents the codepoint to be tested.
>
> FirstChar is a special term which denotes the first codepoint
> in a label.
>
> LastChar is a special term which denotes the last codepoint
> in a label.
>
> .eq. represents the equality relation.
>
>     A .eq. B evaluates to True if A equals B.
>
> .ne. represents the non-equality relation.
>
>     A .ne. B evaluates to True if A is not equal to B.
>
> .in. represents the set inclusion relation.
>
>     A .in. B evaluates to True if A is a member of the set B.
>
> A functional notation, Function_Name(cp), is used to express
> either string positions within a label, Boolean character
> property tests of a codepoint, or a regular expression
> match. When such function names
> refer to Boolean character property tests, the function names
> use the exact Unicode character property name for the property
> in question, and "cp" is evaluated as the Unicode value
> of the codepoint to be tested, rather than as its position
> in the label. When such function names refer to string positions
> within a label, "cp" is evaluated as its position in the label.
>
> RegExpMatch(X) takes as its parameter X a schematic regular
> expression consisting of a mix of Unicode character property
> values and literal Unicode codepoints.
>
> Script(cp) returns the value of the Unicode Script property,
> as defined in Scripts.txt in the Unicode Character Database.
>
> Canonical_Combining_Class(cp) returns the value of the
> Unicode Canonical_Combining_Class property, as defined in
> UnicodeData.txt in the Unicode Character Database.
>
> Before(cp) returns the codepoint of the character
> immediately preceding cp in logical order in the string
> representing the label. Before(FirstChar) evaluates to
> Undefined.
>
> After(cp) returns the codepoint of the character
> immediately following cp in logical order in the string
> representing the label. After(LastChar) evaluates to
> Undefined.
>
> Note that "Before" and "After" do not refer
> to the visual display order of the character in a label,
> which may be reversed or otherwise modified by the
> bidirectional algorithm for labels including characters
> from scripts written right-to-left.
>
> Repeated evaluation for all characters in a label makes
> use of the special construct:
>
>   For All Characters:
>      Expression;
>   End For;
>
> This construct requires repeated evaluation of "Expression"
> for each codepoint in the label, starting from FirstChar
> and proceeding to LastChar.
>
> ===============================================================
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20090724/1954f17c/attachment.pgp 


More information about the Idna-update mailing list