comments on IDNAbis: draft-faltstrom-idnabis-tables-04.txt Arabic block

Patrik Fältström patrik at frobbit.se
Sat Feb 16 23:32:26 CET 2008


On 15 feb 2008, at 17.46, Sarmad Hussain wrote:

> Here are some comments on the draft posted at http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-04.txt 
> .  These observations are based on (mostly) the perspective of  
> Urdu.  Referring to pages 21-22 of the report, my comments are after  
> quoting the relevant line from the report (prefixed by >>>>  
> symbol).  Please especially note comments on 06D4 and “space”  
> character (at the end).
>
> 0600..0603  ; CONTEXTO   # ARABIC NUMBER SIGN..ARABIC SIGN SAFHA
>
>>>>> should be DISALLOWED

We are (see mail from Mark) talking about whether Cf should be  
DISALLOWED and not CONTEXT as only two codepoints actually are needed,  
and can be taken care of with an exception.

> 0610..0615  ; PVALID     # ARABIC SIGN SALLALLAHOU ALAYHE  
> WASSALLAM..ARAB
>
>>>>> 0610..0614  agreed
>
>>>>> 0615 should be DISALLOWED as it is a punctuation, to mark a pause

0615;ARABIC SMALL HIGH TAH;Mn;230;NSM;;;;;N;;;;;

It is of GeneralCategory Mn, and those are allowed. I.e. not  
classified as a punctuation. Is what you are saying that you suggest  
this should be added as an exception and DISALLOWED?

> 0640..065E  ; PVALID     # ARABIC TATWEEL..ARABIC FATHA WITH TWO DOTS
>
>
>>>>> 0640 should be DISALLOWED as it will create significant security  
>>>>> problems (kashida only causes stylistic (not shape) variation of  
>>>>> characters)

0640;ARABIC TATWEEL;Lm;0;AL;;;;;N;;;;;

Codepoints of GeneralCategory Lm is allowed (matches category A). Same  
question as for 0615.

>>>>> 064B..0652  agreed; would love to hear the argument for  
>>>>> including them as there was initially discussion for not  
>>>>> including them

064B;ARABIC FATHATAN;Mn;27;NSM;;;;;N;;;;;
064C;ARABIC DAMMATAN;Mn;28;NSM;;;;;N;;;;;
064D;ARABIC KASRATAN;Mn;29;NSM;;;;;N;;;;;
064E;ARABIC FATHA;Mn;30;NSM;;;;;N;ARABIC FATHAH;;;;
064F;ARABIC DAMMA;Mn;31;NSM;;;;;N;ARABIC DAMMAH;;;;
0650;ARABIC KASRA;Mn;32;NSM;;;;;N;ARABIC KASRAH;;;;
0651;ARABIC SHADDA;Mn;33;NSM;;;;;N;ARABIC SHADDAH;;;;
0652;ARABIC SUKUN;Mn;34;NSM;;;;;N;;;;;
0653;ARABIC MADDAH ABOVE;Mn;230;NSM;;;;;N;;;;;

All of them are Mn. And there are no exclusions for them.

>>>>> 0656..0658 agreed; would love to hear the argument for including  
>>>>> them as there was initially discussion for not including them

0656;ARABIC SUBSCRIPT ALEF;Mn;220;NSM;;;;;N;;;;;
0657;ARABIC INVERTED DAMMA;Mn;230;NSM;;;;;N;;;;;
0658;ARABIC MARK NOON GHUNNA;Mn;230;NSM;;;;;N;;;;;

Also GeneralCategory Mn.

>>>>> 0659..065E  do not know enough about them to comment

0659;ARABIC ZWARAKAY;Mn;230;NSM;;;;;N;;;;;
065A;ARABIC VOWEL SIGN SMALL V ABOVE;Mn;230;NSM;;;;;N;;;;;
065B;ARABIC VOWEL SIGN INVERTED SMALL V ABOVE;Mn;230;NSM;;;;;N;;;;;
065C;ARABIC VOWEL SIGN DOT BELOW;Mn;220;NSM;;;;;N;;;;;
065D;ARABIC REVERSED DAMMA;Mn;230;NSM;;;;;N;;;;;
065E;ARABIC FATHA WITH TWO DOTS;Mn;230;NSM;;;;;N;;;;;

All of GeneralCategory Mn. That is why they are PVALID.

> 066E..0674  ; PVALID     # ARABIC LETTER DOTLESS BEH..ARABIC LETTER  
> HIGH
>
>>>>> agreed; though reservations with 066E..066F (as Unicode standard  
>>>>> does not mention if they are actually used in any language; if  
>>>>> not part of any language, their inclusion may only contribute to  
>>>>> security problems)

066E;ARABIC LETTER DOTLESS BEH;Lo;0;AL;;;;;N;;;;;
066F;ARABIC LETTER DOTLESS QAF;Lo;0;AL;;;;;N;;;;;

GeneralCategory Lo, and because of that PVALID.

> 06D4        ; DISALLOWED # ARABIC FULL STOP
>
>>>>> should be allowed as a delimeter for Urdu, like the dot in the  
>>>>> domain name (should be mapped onto a dot automatically at client  
>>>>> layer);  As internationalized domain names deal with the end  
>>>>> user layer (application layer), they need to be a bit more  
>>>>> sensitive to user needs.  This delimeter, as specified in  
>>>>> Unicode, is only required for Urdu.  However, Urdu writing does  
>>>>> not have a dot and dot is also not present on Urdu keyboards.   
>>>>> If the delimeter is not allowed (and then mapped to dot), the  
>>>>> user will get confused and also will not be able to type the dot  
>>>>> without having an English keyboard installed and without  
>>>>> switching to English keyboard 2-3 times within writing a single  
>>>>> domain name in Urdu (once to-english-and-back-to-Urdu between  
>>>>> each level of TLD).  Standard should include this as a  
>>>>> recommendation for applications.

06D4;ARABIC FULL STOP;Po;0;AL;;;;;N;ARABIC PERIOD;;;;

This is DISALLOWED as it is of GeneralCategory Al. I let others  
discuss the issues with full stop.

> 06D5..06DC  ; PVALID     # ARABIC LETTER AE..ARABIC SMALL HIGH SEEN
>
>>>>> agreed; would love to hear the argument for including combining  
>>>>> marks as there was initially discussion for not including them

06D5;ARABIC LETTER AE;Lo;0;AL;;;;;N;;;;;

GeneralCategory Lo.

06D6;ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA;Mn; 
230;NSM;;;;;N;;;;;
06D7;ARABIC SMALL HIGH LIGATURE QAF WITH LAM WITH ALEF MAKSURA;Mn; 
230;NSM;;;;;N;;;;;
06D8;ARABIC SMALL HIGH MEEM INITIAL FORM;Mn;230;NSM;;;;;N;;;;;
06D9;ARABIC SMALL HIGH LAM ALEF;Mn;230;NSM;;;;;N;;;;;
06DA;ARABIC SMALL HIGH JEEM;Mn;230;NSM;;;;;N;;;;;
06DB;ARABIC SMALL HIGH THREE DOTS;Mn;230;NSM;;;;;N;;;;;
06DC;ARABIC SMALL HIGH SEEN;Mn;230;NSM;;;;;N;;;;;

GeneralCategory Mn.

> 06DD        ; CONTEXTO   # ARABIC END OF AYAH
>
>>>>> should be DISALLOWED

06DD;ARABIC END OF AYAH;Cf;0;AL;;;;;N;;;;;

GeneralCategory Cf. See above on 0600..0603.

> 06DF..06E8  ; PVALID     # ARABIC SMALL HIGH ROUNDED ZERO..ARABIC  
> SMALL H
>
>>>>> agreed; would love to hear the argument for including combining  
>>>>> marks as there was initially discussion for not including them

06DF;ARABIC SMALL HIGH ROUNDED ZERO;Mn;230;NSM;;;;;N;;;;;
06E0;ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO;Mn;230;NSM;;;;;N;;;;;
06E1;ARABIC SMALL HIGH DOTLESS HEAD OF KHAH;Mn;230;NSM;;;;;N;;;;;
06E2;ARABIC SMALL HIGH MEEM ISOLATED FORM;Mn;230;NSM;;;;;N;;;;;
06E3;ARABIC SMALL LOW SEEN;Mn;220;NSM;;;;;N;;;;;
06E4;ARABIC SMALL HIGH MADDA;Mn;230;NSM;;;;;N;;;;;
06E7;ARABIC SMALL HIGH YEH;Mn;230;NSM;;;;;N;;;;;
06E8;ARABIC SMALL HIGH NOON;Mn;230;NSM;;;;;N;;;;;

GeneralCategory Mn.

06E5;ARABIC SMALL WAW;Lm;0;AL;;;;;N;;;;;
06E6;ARABIC SMALL YEH;Lm;0;AL;;;;;N;;;;;

GeneralCategory Lm.

> 06EA..06FC  ; PVALID     # ARABIC EMPTY CENTRE LOW STOP..ARABIC  
> LETTER GH
>
>>>>>>>>> agreed; would love to hear the argument for including  
>>>>>>>>> combining marks as there was initially discussion for not  
>>>>>>>>> including them

06EA;ARABIC EMPTY CENTRE LOW STOP;Mn;220;NSM;;;;;N;;;;;
06EB;ARABIC EMPTY CENTRE HIGH STOP;Mn;230;NSM;;;;;N;;;;;
06EC;ARABIC ROUNDED HIGH STOP WITH FILLED CENTRE;Mn;230;NSM;;;;;N;;;;;
06ED;ARABIC SMALL LOW MEEM;Mn;220;NSM;;;;;N;;;;;

GeneralCategory Mn.

06EE;ARABIC LETTER DAL WITH INVERTED V;Lo;0;AL;;;;;N;;;;;
06EF;ARABIC LETTER REH WITH INVERTED V;Lo;0;AL;;;;;N;;;;;
06FA;ARABIC LETTER SHEEN WITH DOT BELOW;Lo;0;AL;;;;;N;;;;;
06FB;ARABIC LETTER DAD WITH DOT BELOW;Lo;0;AL;;;;;N;;;;;
06FC;ARABIC LETTER GHAIN WITH DOT BELOW;Lo;0;AL;;;;;N;;;;;

GeneralCategory Lo.

06F0;EXTENDED ARABIC-INDIC DIGIT ZERO;Nd;0;EN;;0;0;0;N;EASTERN ARABIC- 
INDIC DIGIT ZERO;;;;
06F1;EXTENDED ARABIC-INDIC DIGIT ONE;Nd;0;EN;;1;1;1;N;EASTERN ARABIC- 
INDIC DIGIT ONE;;;;
06F2;EXTENDED ARABIC-INDIC DIGIT TWO;Nd;0;EN;;2;2;2;N;EASTERN ARABIC- 
INDIC DIGIT TWO;;;;
06F3;EXTENDED ARABIC-INDIC DIGIT THREE;Nd;0;EN;;3;3;3;N;EASTERN ARABIC- 
INDIC DIGIT THREE;;;;
06F4;EXTENDED ARABIC-INDIC DIGIT FOUR;Nd;0;EN;;4;4;4;N;EASTERN ARABIC- 
INDIC DIGIT FOUR;;;;
06F5;EXTENDED ARABIC-INDIC DIGIT FIVE;Nd;0;EN;;5;5;5;N;EASTERN ARABIC- 
INDIC DIGIT FIVE;;;;
06F6;EXTENDED ARABIC-INDIC DIGIT SIX;Nd;0;EN;;6;6;6;N;EASTERN ARABIC- 
INDIC DIGIT SIX;;;;
06F7;EXTENDED ARABIC-INDIC DIGIT SEVEN;Nd;0;EN;;7;7;7;N;EASTERN ARABIC- 
INDIC DIGIT SEVEN;;;;
06F8;EXTENDED ARABIC-INDIC DIGIT EIGHT;Nd;0;EN;;8;8;8;N;EASTERN ARABIC- 
INDIC DIGIT EIGHT;;;;
06F9;EXTENDED ARABIC-INDIC DIGIT NINE;Nd;0;EN;;9;9;9;N;EASTERN ARABIC- 
INDIC DIGIT NINE;;;;

GeneralCategory Nd.

> 06FD..06FE  ; DISALLOWED # ARABIC SIGN SINDHI AMPERSAND..ARABIC SIGN  
> SIND
>
>>>>> need time to consult and comment on this.

06FD;ARABIC SIGN SINDHI AMPERSAND;So;0;AL;;;;;N;;;;;
06FE;ARABIC SIGN SINDHI POSTPOSITION MEN;So;0;AL;;;;;N;;;;;

>>>>>

DISALLOWED because GeneralCategory So.

> In addition, in Urdu we also would have a problem for not allowing  
> space as we do not have use of ZWNJ in Pakistan.  Urdu users in  
> Pakistan type space whether it is required to shape letter within a  
> word or at the end of it.  It is not possible to train all users to  
> distinguish between space and ZWNJ (especially as the latter is not  
> a linguistic entity in the language and users are never taught its  
> concept, but a computational engineering solution from the  
> perspective of Urdu).  As the domain name standard has to deal with  
> applications with which users will be directly interacting, it may  
> also be included as a recommendation (at least for Urdu) that the  
> users may be allowed to type it and it may be automatically be  
> converted to ZWNJ (and could follow same rules as ZWNJ after such  
> conversion).

There is a separate discussion on ZWJ and ZWNJ and space.

Thank you for the input.

     Patrik



More information about the Idna-update mailing list