Tatweel
Vint Cerf
vint at google.com
Fri Mar 20 13:19:02 CET 2009
Alireza,
your note makes me think a bit about what I believe to be the
difference in philosophy between IDNA2003 and IDNA2008. Under
IDNA2008, effort has been made to be fairly cautious about what is
included by using Unicode's characterizations of the role of
characters. Appearance has less to do with this than function in
expression. Generally, punctuation is excluded except for special
cases such as ZWJ/ZWNJ for example. I am not a speaker nor a reader of
Arabic script so I have to be guided by others who are expert but it
sounds on the surface as if the proposal is related to the function of
U+0640. Ideally, inclusion or exclusion should be the product of the
Rules that generate the tables of the Tables document (editor Patrik
Faltstrom). If it is not ruled out (literally) but there is a
compelling argument for exclusion, it would need to become an
exception I believe.
Mark,
One of the many concerns I have heard raised on this list relates to
character-by-character assessment of Unicode as it applies to IDNs. I
think few people wish to produce IDNA tables that way. I don't dispute
your reasoning to exclude (I don't know enough about Arabic to do so)
but I am wondering whether there is a way to do this that is rule-
based or context based or something that exercises the mechanisms of
IDNA2008?
vint
Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com
On Mar 20, 2009, at 7:00 AM, Alireza Saleh wrote:
> I don't see why we should not just let the registry have the authority
> to do this? If you want to disallow this at the protocol level, you
> should also consider disallowing the Low rise 'U+005F' and
> Hyphen-minus U+002D because these have also the same shape as Tatweel
> specially when they come in between of non-joining characters. My
> opinion is to limit protocol prohibitions to absolutely necessary
> cases.
>
> Alireza
>
> Mark Davis wrote:
>> I propose that we make U+0640 ( ـ ) ARABIC TATWEEL (aka
>> kashida) be
>> DISALLOWED, adding it to
>> http://tools.ietf.org/html/draft-ietf-idnabis-tables-05#section-2.6.
>> Currently it is PVALID, but it does not carry semantics by any
>> Arabic-Script orthography, and its only value is for spoofing.
>>
>> For example: جوجل can be written with extra kashidas as
>> جـوجل or as
>> جوجـل by inserting a kashida after the first or third
>> character. This
>> is very hard for users to detect. We added it to Unicode for use in
>> manual justification, but has no place in IDNA.
>>
>> (http://en.wikipedia.org/wiki/Kashida,
>> http://unicode.org/cldr/utility/character.jsp?a=0640)
>>
>> Mark
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
More information about the Idna-update
mailing list