I think the simplest course of action is just require the character before to be Hebrew; that is a sufficient limitation on usage.<br><br clear="all">Mark<br>
<br><br><div class="gmail_quote">On Thu, Jul 23, 2009 at 02:41, Matitiahu Allouche <span dir="ltr"><<a href="mailto:matial@il.ibm.com">matial@il.ibm.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I totally agree with Ken's analysis of Gershayim usage, and with his<br>
simplified pseudo-code.<br>
<br>
However, I seem to remember somebody mentioning using Gershayim at the<br>
boundary between preceding Hebrew letters and succeeding letters from<br>
another script. Personally, I see no need for this, and such a label<br>
would probably be disallowed anyway by the rules for Bidi domain names.<br>
Still, if anybody thinks there is such a use case, he/she should speak<br>
now.<br>
<br>
Shalom (Regards), Mati<br>
Bidi Architect<br>
Globalization Center Of Competency - Bidirectional Scripts<br>
IBM Israel<br>
Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52<br>
2554160<br>
<br>
<br>
<br>
<br>
Kenneth Whistler <<a href="mailto:kenw@sybase.com">kenw@sybase.com</a>><br>
Sent by: <a href="mailto:idna-update-bounces@alvestrand.no">idna-update-bounces@alvestrand.no</a><br>
22/07/2009 05:07<br>
Please respond to<br>
Kenneth Whistler <<a href="mailto:kenw@sybase.com">kenw@sybase.com</a>><br>
<br>
<br>
To<br>
<a href="mailto:patrik@frobbit.se">patrik@frobbit.se</a><br>
cc<br>
<a href="mailto:idna-update@alvestrand.no">idna-update@alvestrand.no</a>, <a href="mailto:kenw@sybase.com">kenw@sybase.com</a><br>
Subject<br>
tables-06b.txt: A.8 Gershayim<br>
<div><div></div><div class="h5"><br>
<br>
<br>
<br>
<br>
<br>
Patrik,<br>
<br>
With my general concerns about the pseudo-code<br>
out of the way, I'll now take up the issue of<br>
how to express the rule set for A.8. HEBREW PUNCTUATION<br>
GERSHAYIM.<br>
<br>
Currently, the relevant parts of the Appendix state:<br>
<br>
Overview:<br>
The script of the preceding character and the subsequent<br>
character, if any, MUST be Hebrew.<br>
...<br>
Rule Set:<br>
False;<br>
If Script(Before(cp)) .eq. Hebrew And<br>
LastChar .eq. cp Then True;<br>
If Script(Before(cp)) .eq. Hebrew And<br>
Script(After(cp)) .eq. Hebrew Then True;<br>
<br>
First let's consider what the appropriate context for<br>
the gershayim are in ordinary Hebrew text usage.<br>
<br>
The gershayim are used to indicate that a word is to<br>
be read as an acronym, rather than as a regular word.<br>
Its position in the acronym is between the next-to-last<br>
and the last letters of the non-inflected form of the<br>
acronym. What that means is that it will be preceded<br>
by one or more letters, and will be followed by at<br>
least one letter (and possibly more, if the acronym is<br>
inflected). But it shouldn't occur at the beginning or<br>
end of a word.<br>
<br>
The gershayim are also used to mark numerical usage of<br>
Hebrew letters, but in the case where a number is<br>
represented by two or more Hebrew numerals. So again,<br>
in that case, it would be internal to the numeral,<br>
and not at the beginning or end.<br>
<br>
Then there is a usage to indicate transliteration of<br>
a foreign word -- but again the position is word-internal,<br>
between the next-to-last and the last character of the<br>
word.<br>
<br>
>From this summary, it would seem that in *normal* usage,<br>
gershayim should always occur internal to a word. If<br>
Mati agrees with that general characterization, then I<br>
believe the context we need to summarize in the Overview<br>
is more constrained:<br>
<br>
Overview:<br>
The script of the preceding character and the subsequent<br>
character MUST be Hebrew.<br>
<br>
And I think *more* constrained is good in this case, as<br>
internal to a Hebrew word is much less likely to cause<br>
either confusion with quotation marks or any bidi quirks.<br>
<br>
If we agree on that more constrained statement of the<br>
intended context, then the Rule Set itself can be<br>
simplified to:<br>
<br>
Rule Set:<br>
False;<br>
If Script(Before(cp)) .eq. Hebrew And<br>
Script(After(cp)) .eq. Hebrew Then True;<br>
<br>
Note that with my restatement of the pseudo-code, the<br>
edge cases of gershayim at the beginning or end of a label<br>
will automatically be excluded, because Before(cp)<br>
would evaluate to Undefined at the start of a label<br>
and After(cp) would evaluate to Undefined at the end of<br>
a label.<br>
<br>
I believe this restatement and simplification of A.8<br>
would be of service to the IDNA2008 users.<br>
<br>
--Ken<br>
<br>
_______________________________________________<br>
Idna-update mailing list<br>
<a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>
<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
</div></div></blockquote></div><br>