comments on IDNAbis: draft-faltstrom-idnabis-tables-04.txt Arabic
block
Sarmad Hussain
sarmad.hussain at nu.edu.pk
Fri Feb 15 17:46:00 CET 2008
Dear All,
Here are some comments on the draft posted at http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-04.txt. These observations are based on (mostly) the perspective of Urdu. Referring to pages 21-22 of the report, my comments are after quoting the relevant line from the report (prefixed by >>>> symbol). Please especially note comments on 06D4 and “space” character (at the end).
0600..0603 ; CONTEXTO # ARABIC NUMBER SIGN..ARABIC SIGN SAFHA
>>>> should be DISALLOWED
060B..060F ; DISALLOWED # AFGHANI SIGN..ARABIC SIGN MISRA
>>>>agreed
0610..0615 ; PVALID # ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARAB
>>>> 0610..0614 agreed
>>>> 0615 should be DISALLOWED as it is a punctuation, to mark a pause
061B ; DISALLOWED # ARABIC SEMICOLON
061E..061F ; DISALLOWED # ARABIC TRIPLE DOT PUNCTUATION MARK..ARABIC QUE
>>>>agreed
0621..063A ; PVALID # ARABIC LETTER HAMZA..ARABIC LETTER GHAIN
>>>>agreed
0640..065E ; PVALID # ARABIC TATWEEL..ARABIC FATHA WITH TWO DOTS
>>>>0640 should be DISALLOWED as it will create significant security problems (kashida only causes stylistic (not shape) variation of characters)
>>>>0641..064A agreed as PVALID
>>>>064B..0652 agreed; would love to hear the argument for including them as there was initially discussion for not including them
>>>>0653..0655 agreed as PVALID
>>>>0656..0658 agreed; would love to hear the argument for including them as there was initially discussion for not including them
>>>>0659..065E do not know enough about them to comment
0660..0669 ; PVALID # ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT NI
>>>>agreed
066A..066D ; DISALLOWED # ARABIC PERCENT SIGN..ARABIC FIVE POINTED STAR
>>>>agreed
066E..0674 ; PVALID # ARABIC LETTER DOTLESS BEH..ARABIC LETTER HIGH
>>>>agreed; though reservations with 066E..066F (as Unicode standard does not mention if they are actually used in any language; if not part of any language, their inclusion may only contribute to security problems)
0675..0678 ; DISALLOWED # ARABIC LETTER HIGH HAMZA ALEF..ARABIC LETTER H
>>>>0659..065E do not know enough about them to comment
0679..06D3 ; PVALID # ARABIC LETTER TTEH..ARABIC LETTER YEH BARREE W
>>>>agreed
06D4 ; DISALLOWED # ARABIC FULL STOP
>>>>should be allowed as a delimeter for Urdu, like the dot in the domain name (should be mapped onto a dot automatically at client layer); As internationalized domain names deal with the end user layer (application layer), they need to be a bit more sensitive to user needs. This delimeter, as specified in Unicode, is only required for Urdu. However, Urdu writing does not have a dot and dot is also not present on Urdu keyboards. If the delimeter is not allowed (and then mapped to dot), the user will get confused and also will not be able to type the dot without having an English keyboard installed and without switching to English keyboard 2-3 times within writing a single domain name in Urdu (once to-english-and-back-to-Urdu between each level of TLD). Standard should include this as a recommendation for applications.
06D5..06DC ; PVALID # ARABIC LETTER AE..ARABIC SMALL HIGH SEEN
>>>>agreed; would love to hear the argument for including combining marks as there was initially discussion for not including them
06DD ; CONTEXTO # ARABIC END OF AYAH
>>>> should be DISALLOWED
06DE ; DISALLOWED # ARABIC START OF RUB EL HIZB
>>>>agreed
06DF..06E8 ; PVALID # ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL H
>>>>agreed; would love to hear the argument for including combining marks as there was initially discussion for not including them
06E9 ; DISALLOWED # ARABIC PLACE OF SAJDAH
>>>>agreed
06EA..06FC ; PVALID # ARABIC EMPTY CENTRE LOW STOP..ARABIC LETTER GH
>>>>>>>>agreed; would love to hear the argument for including combining marks as there was initially discussion for not including them
06FD..06FE ; DISALLOWED # ARABIC SIGN SINDHI AMPERSAND..ARABIC SIGN SIND
>>>>need time to consult and comment on this.
06FF ; PVALID # ARABIC LETTER HEH WITH INVERTED V
>>>>agreed
FDF0..FDFD ; DISALLOWED # ARABIC LIGATURE SALLA USED AS KORANIC STOP SIG
>>>>agreed
In addition, in Urdu we also would have a problem for not allowing space as we do not have use of ZWNJ in Pakistan. Urdu users in Pakistan type space whether it is required to shape letter within a word or at the end of it. It is not possible to train all users to distinguish between space and ZWNJ (especially as the latter is not a linguistic entity in the language and users are never taught its concept, but a computational engineering solution from the perspective of Urdu). As the domain name standard has to deal with applications with which users will be directly interacting, it may also be included as a recommendation (at least for Urdu) that the users may be allowed to type it and it may be automatically be converted to ZWNJ (and could follow same rules as ZWNJ after such conversion).
Best regards,
Sarmad
--------------------------------------------------------
Dr. Sarmad Hussain
سرمد حسین
Professor and Head
پروفیسر اور نگراں
Center for Research in Urdu Language Processing
مرکز تحقیقات اردو
National University of Computer and Emerging Sciences
نیشنل یونی ورسٹی
B Block, Faisal Town
بی بلاک، فیصل ٹاؤن
Lahore, PAKISTAN
لاہور، پاکستان
Ph: (+9242) 111 128 128 ext. 241
فون :۲۴۱ ۔ ۱۲۸ ۱۲۸ ۱۱۱ ۔ ۹۲۴۲
Fax: (+9242) 516 5232
فیکس: ۵۲۳۲ ۵۱۶ ۔ ۹۲۴۲
Email: sarmad.hussain at nu.edu.pk
ای میل:
URL: www.crulp.org www.nu.edu.pk
ویب پتہ:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080215/bea13e93/attachment-0001.html
More information about the Idna-update
mailing list