Tatweel

Fri Mar 20 15:56:31 CET 2009

Vint,

I'm really in substantive agreement with you. Something I wish to add is 
that IETF should assume that registries are competent enough to make 
wise decisions on their own; it should not legislate on the basis that 
some registries are not competent enough to know what is good and safe 
for them. As you say, there should be compelling reasons for making 
exclusions in IDNA2008. Perhaps we should delineate what constitutes 
'compelling reason'.

Best
Alireza

Vint Cerf wrote:
> Alireza,
>
> your note makes me think a bit about what I believe to be the 
> difference in philosophy between IDNA2003 and IDNA2008. Under 
> IDNA2008, effort has been made to be fairly cautious about what is 
> included by using Unicode's characterizations of the role of 
> characters. Appearance has less to do with this than function in 
> expression. Generally, punctuation is excluded except for special 
> cases such as ZWJ/ZWNJ for example. I am not a speaker nor a reader of 
> Arabic script so I have to be guided by others who are expert but it 
> sounds on the surface as if the proposal is related to the function of 
> U+0640. Ideally, inclusion or exclusion should be the product of the 
> Rules that generate the tables of the Tables document (editor Patrik 
> Faltstrom). If it is not ruled out (literally) but there is a 
> compelling argument for exclusion, it would need to become an 
> exception I believe.
>
> Mark,
>
> One of the many concerns I have heard raised on this list relates to 
> character-by-character assessment of Unicode as it applies to IDNs. I 
> think few people wish to produce IDNA tables that way. I don't dispute 
> your reasoning to exclude (I don't know enough about Arabic to do so) 
> but I am wondering whether there is a way to do this that is 
> rule-based or context based or something that exercises the mechanisms 
> of IDNA2008?
>
> vint
>
>
>
> Vint Cerf
> Google
> 1818 Library Street, Suite 400
> Reston, VA 20190
> 202-370-5637
> vint at google.com
>
>
>
>
> On Mar 20, 2009, at 7:00 AM, Alireza Saleh wrote:
>
>> I don't see why we should not just let the registry have the authority
>> to do this? If you want to disallow this at the protocol level, you
>> should also consider  disallowing the Low rise 'U+005F'  and
>> Hyphen-minus U+002D because these have also the same shape as Tatweel
>> specially when they come in between of non-joining characters. My
>> opinion is to limit protocol prohibitions to absolutely necessary cases.
>>
>> Alireza
>>
>> Mark Davis wrote:
>>> I propose that we make U+0640 ( ‎ـ‎ ) ARABIC TATWEEL (aka kashida) be
>>> DISALLOWED, adding it to
>>> http://tools.ietf.org/html/draft-ietf-idnabis-tables-05#section-2.6.
>>> Currently it is PVALID, but it does not carry semantics by any
>>> Arabic-Script orthography, and its only value is for spoofing.
>>>
>>> For example: جوجل can be written with extra kashidas as جـوجل or as
>>> جوجـل by inserting a kashida after the first or third character. This
>>> is very hard for users to detect. We added it to Unicode for use in
>>> manual justification, but has no place in IDNA.
>>>
>>> (http://en.wikipedia.org/wiki/Kashida,
>>> http://unicode.org/cldr/utility/character.jsp?a=0640)
>>>
>>> Mark
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>