my comments on draft-ietf-idnabis-bidi-05
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Wed Sep 9 03:46:55 CEST 2009
On 2009/09/09 4:32, Harald Alvestrand wrote:
> Omitting BN from LTR labels was a mistake that crept in between -03 and
> -04. In order to allow the use of ZWNJ/ZWJ with Indic scripts, BN should
> definitely be allowed in LTR labels.
> I'll fix it in -05.
> Hm.... interestingly, my tests never tested what happens to a BN in the
> algorithm; since paragraph X9 of the BIDI algorithm specification said
> to ignore
> BN, I simply omitted it from my test strings. So I have no idea whether
> a BN at the
> end of a label will jump over delimiters or not - or even if the
> question is meaningful in the context of the Unicode BIDI algorithm. (am
> writing this on a plane, so can't check).
I think the answer is at the end of X9
The zero width joiner and non-joiner affect the shaping of the adjacent
characters—those that are adjacent in the original backing-store order,
even though those characters may end up being rearranged to be
non-adjacent by the Bidirectional Algorithm. For more information, see
Section 5.3, Joiners. (http://www.unicode.org/reports/tr9/#Joiners)
So the characters don't jump, they affect whatever characters are
adjacent to them in logical order.
> The present formulation has the interesting effect that BN is now forbidden
> at the beginning and end of strings, which was not true in -03. I think
> that is an improvement (it outlaws strings like "BN EN", which seems to
> have been permitted by the -03 rule), but is one that I didn't make
We have to carefully check whether we need ZWJ or ZWNJ at the start or
at the end of a label.
For the Arabic script, ZWNJ at the start or at the end of a label
doesn't have any effect. But ZWJ has an effect. However, as I
understand, we allow ZWJ inside a label for cases such as Persian, where
only the cases inside a label are relevant.
For Indic, the ZWJ and ZWNJ have to follow a virama, so there is no
issue at the start. I don't know whether there are cases where they are
needed at the end.
> What does the group think?
> Martin J. Dürst wrote:
>> On 2009/09/08 0:12, John C Klensin wrote:
>>> --On Monday, September 07, 2009 4:11 PM +0900 "\"Martin J.
>>> Dürst\""<duerst at it.aoyama.ac.jp> wrote:
>>>> Hello Mati,
>>>> On 2009/09/07 15:47, Matitiahu Allouche wrote:
>>>>> On October first, Martin J. Dürst asked:
>>>>> conditions 2/4: Why are BN (control characters) allowed in
>>>>> RTL but not in LTR?
>>>>> BN characters are invisible and should be banned as allowing
>>>>> phishing and violating the Label Uniqueness requirement.
>>>>> However, ZWJ and ZWNJ are classified as BN, and ZWNJ is
>>>>> required for the proper orthography of Persian which is
>>>>> written with the Arabic script, hence BNs are allowed in RTL
>>>> That makes a lot of sense. But then shouldn't BN also be
>>>> allowed for LTR, because some of these characters are needed
>>>> in Indic scripts?
>>> Remember that ZWJ and ZWNJ are allowed by exception, not because
>>> they are BN, and that they are classified as CONTEXTJ, not as
>>> DISALLOWED. If we continue with that model --and no one has
>>> argued recently that we should not-- then the relevant question
>>> for ZWJ/ZWNJ is whether the contextual rules are correctly
>>> applied to the scripts in which they are needed
>> This is the question for Tables. I haven't had time to read Tables
>> during last call, but I'm assuming it's doing the right things on this
>>> and not about their membership in BN.
>> Yes, what we want, ideally, is that all the exceptions "just work" (in
>> the sense that they pass the bidi tests) in those contexts where they
>> are allowed.
>> The current Bidi document is written in terms of bidi categories, and
>> so to get ZWJ/ZWNJ to "just work", we have to include their bidi
>> category, namely BN, where relevant. The current Bidi document gets
>> there half-way (or you can say three-fourths) by allowing BN in RTL
>> labels. I proposed (and continue to propose!) that we fix this
>> "half-way" state by allowing BN also in LTR labels. This will
>> eliminate some strange edge cases (currently, any Arabic script label
>> can be combined with any Indic script label, *except if the later
>> contains a ZWJ immediately after a virama* (see
>> Allowing BN also in LTR labels is the easiest fix for the current
>> situation. Other fixes, which potentially fix larger problems, are
>> also possible. One of them is to not mention BN at all in the Bidi
>> document, and just refer to "exceptionally allowed characters in the
>> tables document". This would cover the case where in the future we
>> need some exception from another bidi category. But it would mean that
>> we have to carefully vet that exception also for bidi issues. That's
>> just a 'todo' item on somebody's todo list (whoever will take care of
>> exceptions when they occur), but it's something not to forget.
>>> If anything in Bidi confuses that, or
>>> confuses the more general principle that it does not override
>>> Tables, I would think it needs to be fixed... but I haven't seen
>>> anything that I read as such confusion.
>> I definitely never have concluded such a thing.
>> Regards, Martin.
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Idna-update