my comments on draft-ietf-idnabis-bidi-05

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Tue Sep 8 09:05:25 CEST 2009



On 2009/09/08 0:12, John C Klensin wrote:
>
> --On Monday, September 07, 2009 4:11 PM +0900 "\"Martin J.
> Dürst\""<duerst at it.aoyama.ac.jp>  wrote:
>
>> Hello Mati,
>>
>> On 2009/09/07 15:47, Matitiahu Allouche wrote:
>>> On October first, Martin J. Dürst asked:
>>> conditions 2/4: Why are BN (control characters) allowed in
>>> RTL but not in LTR?
>>>
>>> BN characters are invisible and should be banned as allowing
>>> phishing and violating the Label Uniqueness requirement.
>>> However, ZWJ and ZWNJ are classified as BN, and ZWNJ is
>>> required for the proper orthography of Persian which is
>>> written with the Arabic script, hence BNs are allowed in RTL
>>> labels.
>> That makes a lot of sense. But then shouldn't BN also be
>> allowed for  LTR, because some of these characters are needed
>> in Indic scripts?
>
> Remember that ZWJ and ZWNJ are allowed by exception, not because
> they are BN, and that they are classified as CONTEXTJ, not as
> DISALLOWED.  If we continue with that model --and no one has
> argued recently that we should not-- then the relevant question
> for ZWJ/ZWNJ is whether the contextual rules are correctly
> applied to the scripts in which they are needed

This is the question for Tables. I haven't had time to read Tables 
during last call, but I'm assuming it's doing the right things on this 
issue.

> and not about their membership in BN.

Yes, what we want, ideally, is that all the exceptions "just work" (in 
the sense that they pass the bidi tests) in those contexts where they 
are allowed.

The current Bidi document is written in terms of bidi categories, and so 
to get ZWJ/ZWNJ to "just work", we have to include their bidi category, 
namely BN, where relevant. The current Bidi document gets there half-way 
(or you can say three-fourths) by allowing BN in RTL labels. I proposed 
(and continue to propose!) that we fix this "half-way" state by allowing 
BN also in LTR labels. This will eliminate some strange edge cases 
(currently, any Arabic script label can be combined with any Indic 
script label, *except if the later contains a ZWJ immediately after a 
virama* (see 
http://tools.ietf.org/html/draft-ietf-idnabis-tables-06#appendix-A.2)).


Allowing BN also in LTR labels is the easiest fix for the current 
situation. Other fixes, which potentially fix larger problems, are also 
possible. One of them is to not mention BN at all in the Bidi document, 
and just refer to "exceptionally allowed characters in the tables 
document". This would cover the case where in the future we need some 
exception from another bidi category. But it would mean that we have to 
carefully vet that exception also for bidi issues. That's just a 'todo' 
item on somebody's todo list (whoever will take care of exceptions when 
they occur), but it's something not to forget.


> If anything in Bidi confuses that, or
> confuses the more general principle that it does not override
> Tables, I would think it needs to be fixed... but I haven't seen
> anything that I read as such confusion.

I definitely never have concluded such a thing.

Regards,   Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp


More information about the Idna-update mailing list