comments on IDNAbis: draft-faltstrom-idnabis-tables-04.txt Arabic block

Sarmad Hussain sarmad.hussain at nu.edu.pk
Fri Feb 15 17:46:00 CET 2008


 

Dear All,

 

Here are some comments on the draft posted at http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-04.txt.  These observations are based on (mostly) the perspective of Urdu.  Referring to pages 21-22 of the report, my comments are after quoting the relevant line from the report (prefixed by >>>> symbol).  Please especially note comments on 06D4 and “space” character (at the end).

 

 

0600..0603  ; CONTEXTO   # ARABIC NUMBER SIGN..ARABIC SIGN SAFHA

 

>>>> should be DISALLOWED

 

 
 
060B..060F  ; DISALLOWED # AFGHANI SIGN..ARABIC SIGN MISRA

 

>>>>agreed

 

 
 
0610..0615  ; PVALID     # ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARAB

 

>>>> 0610..0614  agreed

>>>> 0615 should be DISALLOWED as it is a punctuation, to mark a pause

 

 
 
061B        ; DISALLOWED # ARABIC SEMICOLON
061E..061F  ; DISALLOWED # ARABIC TRIPLE DOT PUNCTUATION MARK..ARABIC QUE
 
>>>>agreed
 
 
 
0621..063A  ; PVALID     # ARABIC LETTER HAMZA..ARABIC LETTER GHAIN
 
>>>>agreed 
 
 
 
0640..065E  ; PVALID     # ARABIC TATWEEL..ARABIC FATHA WITH TWO DOTS
 
 
>>>>0640 should be DISALLOWED as it will create significant security problems (kashida only causes stylistic (not shape) variation of characters)
>>>>0641..064A  agreed as PVALID
>>>>064B..0652  agreed; would love to hear the argument for including them as there was initially discussion for not including them
>>>>0653..0655 agreed as PVALID
>>>>0656..0658 agreed; would love to hear the argument for including them as there was initially discussion for not including them
>>>>0659..065E  do not know enough about them to comment
 
 
 
0660..0669  ; PVALID     # ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT NI
 
>>>>agreed
 
 
 
066A..066D  ; DISALLOWED # ARABIC PERCENT SIGN..ARABIC FIVE POINTED STAR
 
>>>>agreed
 
 
 
066E..0674  ; PVALID     # ARABIC LETTER DOTLESS BEH..ARABIC LETTER HIGH
 
>>>>agreed; though reservations with 066E..066F (as Unicode standard does not mention if they are actually used in any language; if not part of any language, their inclusion may only contribute to security problems)
 
 
 
0675..0678  ; DISALLOWED # ARABIC LETTER HIGH HAMZA ALEF..ARABIC LETTER H
 
>>>>0659..065E  do not know enough about them to comment
 
 
 
0679..06D3  ; PVALID     # ARABIC LETTER TTEH..ARABIC LETTER YEH BARREE W
 
>>>>agreed
 
 
 
06D4        ; DISALLOWED # ARABIC FULL STOP
 
>>>>should be allowed as a delimeter for Urdu, like the dot in the domain name (should be mapped onto a dot automatically at client layer);  As internationalized domain names deal with the end user layer (application layer), they need to be a bit more sensitive to user needs.  This delimeter, as specified in Unicode, is only required for Urdu.  However, Urdu writing does not have a dot and dot is also not present on Urdu keyboards.  If the delimeter is not allowed (and then mapped to dot), the user will get confused and also will not be able to type the dot without having an English keyboard installed and without switching to English keyboard 2-3 times within writing a single domain name in Urdu (once to-english-and-back-to-Urdu between each level of TLD).  Standard should include this as a recommendation for applications.  
 
 
 
06D5..06DC  ; PVALID     # ARABIC LETTER AE..ARABIC SMALL HIGH SEEN
 
>>>>agreed; would love to hear the argument for including combining marks as there was initially discussion for not including them
 
 
 
06DD        ; CONTEXTO   # ARABIC END OF AYAH
 
>>>> should be DISALLOWED
 
 
 
06DE        ; DISALLOWED # ARABIC START OF RUB EL HIZB
 
>>>>agreed
 
 
06DF..06E8  ; PVALID     # ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL H
 
>>>>agreed; would love to hear the argument for including combining marks as there was initially discussion for not including them
 
 
 
06E9        ; DISALLOWED # ARABIC PLACE OF SAJDAH
 
>>>>agreed
 
 
06EA..06FC  ; PVALID     # ARABIC EMPTY CENTRE LOW STOP..ARABIC LETTER GH
 
>>>>>>>>agreed; would love to hear the argument for including combining marks as there was initially discussion for not including them
 
 
 
 
06FD..06FE  ; DISALLOWED # ARABIC SIGN SINDHI AMPERSAND..ARABIC SIGN SIND
 
>>>>need time to consult and comment on this.
 
 
06FF        ; PVALID     # ARABIC LETTER HEH WITH INVERTED V

 

>>>>agreed

 

 

FDF0..FDFD  ; DISALLOWED # ARABIC LIGATURE SALLA USED AS KORANIC STOP SIG

 

>>>>agreed

 

 

 

In addition, in Urdu we also would have a problem for not allowing space as we do not have use of ZWNJ in Pakistan.  Urdu users in Pakistan type space whether it is required to shape letter within a word or at the end of it.  It is not possible to train all users to distinguish between space and ZWNJ (especially as the latter is not a linguistic entity in the language and users are never taught its concept, but a computational engineering solution from the perspective of Urdu).  As the domain name standard has to deal with applications with which users will be directly interacting, it may also be included as a recommendation (at least for Urdu) that the users may be allowed to type it and it may be automatically be converted to ZWNJ (and could follow same rules as ZWNJ after such conversion).  

 

 

Best regards,
Sarmad

 

--------------------------------------------------------


Dr. Sarmad Hussain

سرمد حسین


Professor and Head

پروفیسر اور نگراں


Center for Research in Urdu Language Processing

مرکز تحقیقات اردو     


National University of Computer and Emerging Sciences

نیشنل یونی ورسٹی


B Block, Faisal Town

بی بلاک، فیصل ٹاؤن


Lahore, PAKISTAN

لاہور، پاکستان


 

 


Ph: (+9242) 111 128 128 ext. 241

فون :۲۴۱ ۔ ۱۲۸ ۱۲۸ ۱۱۱ ۔ ۹۲۴۲


Fax: (+9242) 516 5232

فیکس: ۵۲۳۲  ۵۱۶ ۔ ۹۲۴۲


Email: sarmad.hussain at nu.edu.pk

ای میل: 


URL: www.crulp.org    www.nu.edu.pk

ویب پتہ: 

 

                                                             

                                                

 

                                          

                                                   

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080215/bea13e93/attachment-0001.html


More information about the Idna-update mailing list