Mixing of AN and EN (Re: Protocol-08 (and status of Defs-04 and Rationale-06))

Harald Alvestrand harald at alvestrand.no
Thu Dec 11 14:27:04 CET 2008


Changing the subject when changing the subject is usually a good idea.....

Alireza Saleh wrote:
> I would sincerely like to see someone out there answer the following
> question:
>
>
> Why has the co-occurrence of AN and EN been forbidden by -bidi ? I
> read that part of the document but didn’t see anything other that
> visual confusion or possible re-arrangement of the label as the reason. If
> all visual confusions and character sequencing problems were solved by
> setting this rule, then it would make sense. However, note the following
> cases:
>
> 1. <ALEF>.3.com (as I stated before)
>   
The current version of -bidi tries to say clearly that:
- use of a label that begins with a digit will cause confusion
- because of the interdiction against inter-label test, there is no rule 
against it

Wise people will put these two things together and choose to not use 
<ALEF>.3.com.
> 2. <U+064A><U+0627>.com ( http://www.nic.ir/Show_Text?c=%D9%8A%D8%A7&s=14&b=ffffff7f&f=01292200&t=DejaVuSans ) 
>    <U+6CC><U+0627>.com ( http://www.nic.ir/Show_Text?c=%DB%8C%D8%A7&s=14&b=ffffff7f&f=01292200&t=DejaVuSans ) (visual confusion problem).
>   
The first example is ARABIC LETTER YEH (BIDI class AL) and ARABIC LETTER 
ALEF (bidi class AL).

The second example is ARABIC LETTER FARSI YEH (BIDI class AL) and ARABIC 
LETTER ALEF.

This definitely has nothing to do with BIDI rules, since all the letters 
are in class AL.
But this
How is this different from CYRILLIC LETTER A and LATIN LETTER A?
> Will the rules solve these ? Either -bidi or Context rules? Or should the
> registry still add further restrictions? Obviously the registry should.
> For these reasons, we believe that the case of numerals should not be
> treated any differently by -bidi. I think it is better to let
> registry decide how to deal with these kinds of problems. dotIR considers
> the possibility of having domains like <U+062C><U+06F5><U+0665>.ir . Why
> should such a domain be banned by the protocol?
ARABIC LETTER JEEM (AL), EXTENDED ARABIC-INDIC DIGIT FIVE (EN), 
ARABIC-INDIC DIGIT FIVE (AN).

 From section 1.3.2 of "Rationale":

   This distinction is important because the reasonable goal of an IDN
   effort is not to be able to write the great Klingon (or language of
   one's choice) novel in DNS labels but to be able to form a usefully
   broad range of mnemonics in ways that are as natural as possible in a
   very broad range of scripts.

I know of no language / script where allowing this particular example 
necessary to "form a usefully broad range of mnemonics".

We know what's wrong with it (it causes problems). I have not heard a 
compelling argument for its inclusion.

                        Harald






More information about the Idna-update mailing list