Mixing of AN and EN (Re: Protocol-08 (and status of Defs-04 and Rationale-06))

Alireza Saleh saleh at nic.ir
Thu Dec 11 22:48:06 CET 2008


Dear Harold,

In my examples, I didn't intend to send something that say the BIDI rule 
fails.
In my email, I tried to demonstrate some examples that require the 
registry to put some restrictions. So, the registry should still have 
good knowledge about the languages which supposed to be supported by the 
registry. The registry still needs to put some internal mappings or 
restrictions such as what Vint suggested for 'Sharp S'. I think the 
problems which may  happen because of having  both Arabic-indic and 
extended-Arabic  characters in the label, are somehow similar to my 
examples. So what I can't understand is, why the protocol leaves 
something to be resolved at the registry and something not.  When there 
is no intra-label  check for a domain, Many sample domains may create by 
users  that pass the protocol rules.  but  when you look at them, it is 
not possible to find out which characters and in which sequences have 
been used in it. I think in this case, it is better to leave the visual 
confusions to be handled out of the protocol. For example, in IDNA2003, 
there was a rule in -bidi that restricts -bidi labels to have digits at 
both ends. In IDNA2008, this rule is relaxed and the bidi-label can have 
digit at the end. This change is very useful, but i think that it causes 
many problems for the registries. for example, in dotIR, we have 
rejected many requests that didn't meet IDNA2003 requirements but they 
meet IDNA2008. As long as those requested labels were against the 
protocol, we didn't keep them in our database. therefore, we not only 
can't give them some sunrise if the policy changes in the future but 
also we have to keep our current policy that denies digits at both ends. 
Fortunately, we predict the possible permission of using ZWNJ later in 
the protocol, so we kept the original domains that entered with ZWNJ. 
Now, it is confusing. I don't know if there is any guarantee that the 
modifications to the UNICODE BIDI algorithm, enable IDNA to permit 
digits at the beginning or mixing Arabic-Indic with Extended-Arabic.

Dear Mark, as you know about the bug that  my friend and I reported to 
the UNICODE( N1 rule of TR9 report ), My concern is that the UTC may 
become to an agreement that some modifications to the TR9 report of the 
UTC are required.

Best Regards
Alireza

Harald Alvestrand wrote:
> Changing the subject when changing the subject is usually a good 
> idea.....
>
> Alireza Saleh wrote:
>> I would sincerely like to see someone out there answer the following
>> question:
>>
>>
>> Why has the co-occurrence of AN and EN been forbidden by -bidi ? I
>> read that part of the document but didn’t see anything other that
>> visual confusion or possible re-arrangement of the label as the 
>> reason. If
>> all visual confusions and character sequencing problems were solved by
>> setting this rule, then it would make sense. However, note the following
>> cases:
>>
>> 1. <ALEF>.3.com (as I stated before)
>>   
> The current version of -bidi tries to say clearly that:
> - use of a label that begins with a digit will cause confusion
> - because of the interdiction against inter-label test, there is no 
> rule against it
>
> Wise people will put these two things together and choose to not use 
> <ALEF>.3.com.
>> 2. <U+064A><U+0627>.com ( 
>> http://www.nic.ir/Show_Text?c=%D9%8A%D8%A7&s=14&b=ffffff7f&f=01292200&t=DejaVuSans 
>> )    <U+6CC><U+0627>.com ( 
>> http://www.nic.ir/Show_Text?c=%DB%8C%D8%A7&s=14&b=ffffff7f&f=01292200&t=DejaVuSans 
>> ) (visual confusion problem).
>>   
> The first example is ARABIC LETTER YEH (BIDI class AL) and ARABIC 
> LETTER ALEF (bidi class AL).
>
> The second example is ARABIC LETTER FARSI YEH (BIDI class AL) and 
> ARABIC LETTER ALEF.
>
> This definitely has nothing to do with BIDI rules, since all the 
> letters are in class AL.
> But this
> How is this different from CYRILLIC LETTER A and LATIN LETTER A?
>> Will the rules solve these ? Either -bidi or Context rules? Or should 
>> the
>> registry still add further restrictions? Obviously the registry should.
>> For these reasons, we believe that the case of numerals should not be
>> treated any differently by -bidi. I think it is better to let
>> registry decide how to deal with these kinds of problems. dotIR 
>> considers
>> the possibility of having domains like <U+062C><U+06F5><U+0665>.ir . Why
>> should such a domain be banned by the protocol?
> ARABIC LETTER JEEM (AL), EXTENDED ARABIC-INDIC DIGIT FIVE (EN), 
> ARABIC-INDIC DIGIT FIVE (AN).
>
> From section 1.3.2 of "Rationale":
>
>   This distinction is important because the reasonable goal of an IDN
>   effort is not to be able to write the great Klingon (or language of
>   one's choice) novel in DNS labels but to be able to form a usefully
>   broad range of mnemonics in ways that are as natural as possible in a
>   very broad range of scripts.
>
> I know of no language / script where allowing this particular example 
> necessary to "form a usefully broad range of mnemonics".
>
> We know what's wrong with it (it causes problems). I have not heard a 
> compelling argument for its inclusion.
>
>                        Harald
>
>
>
>



More information about the Idna-update mailing list