comments on IDNAbis: draft-faltstrom-idnabis-tables-04.txt Arabic block

Sarmad Hussain sarmad.hussain at nu.edu.pk
Mon Feb 18 18:21:29 CET 2008


Dear Patrik and all,

> >
> > 0600..0603  ; CONTEXTO   # ARABIC NUMBER SIGN..ARABIC SIGN SAFHA
> >
> >>>>> should be DISALLOWED
> 
> We are (see mail from Mark) talking about whether Cf should be
> DISALLOWED and not CONTEXT as only two codepoints actually are needed,
> and can be taken care of with an exception.
> 


What is the definition of Cf.  Could not find it in the document.  Could you
please elaborate.  0600..0603 are symbols, neither letters nor digits.
Thus, should be DISALLOWED.  


> >
> >>>>> 0615 should be DISALLOWED as it is a punctuation, to mark a pause
> 
> 0615;ARABIC SMALL HIGH TAH;Mn;230;NSM;;;;;N;;;;;
> 
> It is of GeneralCategory Mn, and those are allowed. I.e. not
> classified as a punctuation. Is what you are saying that you suggest
> this should be added as an exception and DISALLOWED?
> 

Yes, if that is the case, 0615 should be added to exception list and
DISALLOWED.  Please note that 0615 + 066E may be confused with 0679.  Thus,
066E should also be disallowed (if it is not a character in any language) as
it may cause security problems.  Also see comment for 066E below.



> > 0640..065E  ; PVALID     # ARABIC TATWEEL..ARABIC FATHA WITH TWO DOTS
> >
> >
> >>>>> 0640 should be DISALLOWED as it will create significant security
> >>>>> problems (kashida only causes stylistic (not shape) variation of
> >>>>> characters)
> 
> 0640;ARABIC TATWEEL;Lm;0;AL;;;;;N;;;;;
> 
> Codepoints of GeneralCategory Lm is allowed (matches category A). Same
> question as for 0615.
> 

Yes, 0640 should be DISALLOWED, by adding through exception list.  It has no
linguistic significance and may cause security problem if allowed.



> 
> > 066E..0674  ; PVALID     # ARABIC LETTER DOTLESS BEH..ARABIC LETTER
> > HIGH
> >
> >>>>> agreed; though reservations with 066E..066F (as Unicode standard
> >>>>> does not mention if they are actually used in any language; if
> >>>>> not part of any language, their inclusion may only contribute to
> >>>>> security problems)
> 
> 066E;ARABIC LETTER DOTLESS BEH;Lo;0;AL;;;;;N;;;;;
> 066F;ARABIC LETTER DOTLESS QAF;Lo;0;AL;;;;;N;;;;;
> 
> GeneralCategory Lo, and because of that PVALID.
> 


If they do not belong to any language, they should be DISALLOWED as they may
cause security problems, e.g. see comment on 0615 above, and 066E + 065C may
be confusable with 0628.  




> > 06D4        ; DISALLOWED # ARABIC FULL STOP
> >
> >>>>> should be allowed as a delimeter for Urdu, like the dot in the
> >>>>> domain name (should be mapped onto a dot automatically at client
> >>>>> layer);  As internationalized domain names deal with the end
> >>>>> user layer (application layer), they need to be a bit more
> >>>>> sensitive to user needs.  This delimeter, as specified in
> >>>>> Unicode, is only required for Urdu.  However, Urdu writing does
> >>>>> not have a dot and dot is also not present on Urdu keyboards.
> >>>>> If the delimeter is not allowed (and then mapped to dot), the
> >>>>> user will get confused and also will not be able to type the dot
> >>>>> without having an English keyboard installed and without
> >>>>> switching to English keyboard 2-3 times within writing a single
> >>>>> domain name in Urdu (once to-english-and-back-to-Urdu between
> >>>>> each level of TLD).  Standard should include this as a
> >>>>> recommendation for applications.
> 
> 06D4;ARABIC FULL STOP;Po;0;AL;;;;;N;ARABIC PERIOD;;;;
> 
> This is DISALLOWED as it is of GeneralCategory Al. I let others
> discuss the issues with full stop.
> 


Recommendation for application layer and map to dot/full stop within the
standard would help, if not allowed otherwise.  Again, this is a requirement
for Urdu as described earlier.  



> > 06DD        ; CONTEXTO   # ARABIC END OF AYAH
> >
> >>>>> should be DISALLOWED
> 
> 06DD;ARABIC END OF AYAH;Cf;0;AL;;;;;N;;;;;
> 
> GeneralCategory Cf. See above on 0600..0603.
> 

06DD should be DISALLOWED as it marks end of phrase/sentence like full stop
in English.  



> 
> > 06FD..06FE  ; DISALLOWED # ARABIC SIGN SINDHI AMPERSAND..ARABIC SIGN
> > SIND
> >
> >>>>> need time to consult and comment on this.
> 
> 06FD;ARABIC SIGN SINDHI AMPERSAND;So;0;AL;;;;;N;;;;;
> 06FE;ARABIC SIGN SINDHI POSTPOSITION MEN;So;0;AL;;;;;N;;;;;
> 
> >>>>>
> 
> DISALLOWED because GeneralCategory So.
> 

Could not find So in the document.  Please elaborate.  

Just consulted with Dr. Qasim Bughio on telephone.  He was the Chairman of
Sindhi Language Authority in Pakistan and is currently Professor and Dean of
Faculty of Arts at University of Sindh in Jomshoro, Pakistan (see
http://arts.usindh.edu.pk/).  According to him 06FD is the word "and" in
Sindhi and has no replacement.  Similarly 06FE is the word "in" in Sindhi
which also has no replacement.  Both are used very frequently in the
language.  

Thus, both 06FD and 06FE MUST be considered PVALID and allowed in
internationalized domain names for Sindhi.



> > In addition, in Urdu we also would have a problem for not allowing
> > space as we do not have use of ZWNJ in Pakistan.  Urdu users in
> > Pakistan type space whether it is required to shape letter within a
> > word or at the end of it.  It is not possible to train all users to
> > distinguish between space and ZWNJ (especially as the latter is not
> > a linguistic entity in the language and users are never taught its
> > concept, but a computational engineering solution from the
> > perspective of Urdu).  As the domain name standard has to deal with
> > applications with which users will be directly interacting, it may
> > also be included as a recommendation (at least for Urdu) that the
> > users may be allowed to type it and it may be automatically be
> > converted to ZWNJ (and could follow same rules as ZWNJ after such
> > conversion).
> 
> There is a separate discussion on ZWJ and ZWNJ and space.
> 


Space should be allowed at user end applications, and collapsed to ZWNJ
during pre-processing, at least for Urdu and some other languages spoken in
Pakistan.  Such recommendations could be added to these drafts.  


> Thank you for the input.
> 
>      Patrik


Thanks for considering these requirements and for your response.

Best regards,
Sarmad



More information about the Idna-update mailing list