Tatweel

Vint Cerf vint at google.com
Fri Mar 20 13:19:02 CET 2009


Alireza,

your note makes me think a bit about what I believe to be the  
difference in philosophy between IDNA2003 and IDNA2008. Under  
IDNA2008, effort has been made to be fairly cautious about what is  
included by using Unicode's characterizations of the role of  
characters. Appearance has less to do with this than function in  
expression. Generally, punctuation is excluded except for special  
cases such as ZWJ/ZWNJ for example. I am not a speaker nor a reader of  
Arabic script so I have to be guided by others who are expert but it  
sounds on the surface as if the proposal is related to the function of  
U+0640. Ideally, inclusion or exclusion should be the product of the  
Rules that generate the tables of the Tables document (editor Patrik  
Faltstrom). If it is not ruled out (literally) but there is a  
compelling argument for exclusion, it would need to become an  
exception I believe.

Mark,

One of the many concerns I have heard raised on this list relates to  
character-by-character assessment of Unicode as it applies to IDNs. I  
think few people wish to produce IDNA tables that way. I don't dispute  
your reasoning to exclude (I don't know enough about Arabic to do so)  
but I am wondering whether there is a way to do this that is rule- 
based or context based or something that exercises the mechanisms of  
IDNA2008?

vint



Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com




On Mar 20, 2009, at 7:00 AM, Alireza Saleh wrote:

> I don't see why we should not just let the registry have the authority
> to do this? If you want to disallow this at the protocol level, you
> should also consider  disallowing the Low rise 'U+005F'  and
> Hyphen-minus U+002D because these have also the same shape as Tatweel
> specially when they come in between of non-joining characters. My
> opinion is to limit protocol prohibitions to absolutely necessary  
> cases.
>
> Alireza
>
> Mark Davis wrote:
>> I propose that we make U+0640 ( ‎ـ‎ ) ARABIC TATWEEL (aka  
>> kashida) be
>> DISALLOWED, adding it to
>> http://tools.ietf.org/html/draft-ietf-idnabis-tables-05#section-2.6.
>> Currently it is PVALID, but it does not carry semantics by any
>> Arabic-Script orthography, and its only value is for spoofing.
>>
>> For example: جوجل can be written with extra kashidas as  
>> جـوجل or as
>> جوجـل by inserting a kashida after the first or third  
>> character. This
>> is very hard for users to detect. We added it to Unicode for use in
>> manual justification, but has no place in IDNA.
>>
>> (http://en.wikipedia.org/wiki/Kashida,
>> http://unicode.org/cldr/utility/character.jsp?a=0640)
>>
>> Mark
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update



More information about the Idna-update mailing list