Mixing of AN and EN (Re: Protocol-08 (and status of Defs-04 and Rationale-06))

Thu Dec 11 22:59:37 CET 2008

Hi Alireza,

When did you report the N1 issue to Unicode? Did they say when they
might have an answer for you?

The IDNA2008 bidi spec is based heavily on the Unicode bidi algorithm,
so I'm a bit surprised to hear that there is an attempt to change the
Unicode bidi algorithm, just as we are trying to finalize IDNA2008.

Unicode folks, do we have a timeline for a response to the N1 report?
How likely is it that the bidi algorithm will shift under our feet?

Erik

On Thu, Dec 11, 2008 at 1:48 PM, Alireza Saleh <saleh at nic.ir> wrote:
> Dear Harold,
>
> In my examples, I didn't intend to send something that say the BIDI rule
> fails.
> In my email, I tried to demonstrate some examples that require the
> registry to put some restrictions. So, the registry should still have
> good knowledge about the languages which supposed to be supported by the
> registry. The registry still needs to put some internal mappings or
> restrictions such as what Vint suggested for 'Sharp S'. I think the
> problems which may  happen because of having  both Arabic-indic and
> extended-Arabic  characters in the label, are somehow similar to my
> examples. So what I can't understand is, why the protocol leaves
> something to be resolved at the registry and something not.  When there
> is no intra-label  check for a domain, Many sample domains may create by
> users  that pass the protocol rules.  but  when you look at them, it is
> not possible to find out which characters and in which sequences have
> been used in it. I think in this case, it is better to leave the visual
> confusions to be handled out of the protocol. For example, in IDNA2003,
> there was a rule in -bidi that restricts -bidi labels to have digits at
> both ends. In IDNA2008, this rule is relaxed and the bidi-label can have
> digit at the end. This change is very useful, but i think that it causes
> many problems for the registries. for example, in dotIR, we have
> rejected many requests that didn't meet IDNA2003 requirements but they
> meet IDNA2008. As long as those requested labels were against the
> protocol, we didn't keep them in our database. therefore, we not only
> can't give them some sunrise if the policy changes in the future but
> also we have to keep our current policy that denies digits at both ends.
> Fortunately, we predict the possible permission of using ZWNJ later in
> the protocol, so we kept the original domains that entered with ZWNJ.
> Now, it is confusing. I don't know if there is any guarantee that the
> modifications to the UNICODE BIDI algorithm, enable IDNA to permit
> digits at the beginning or mixing Arabic-Indic with Extended-Arabic.
>
> Dear Mark, as you know about the bug that  my friend and I reported to
> the UNICODE( N1 rule of TR9 report ), My concern is that the UTC may
> become to an agreement that some modifications to the TR9 report of the
> UTC are required.
>
> Best Regards
> Alireza
>
> Harald Alvestrand wrote:
>> Changing the subject when changing the subject is usually a good
>> idea.....
>>
>> Alireza Saleh wrote:
>>> I would sincerely like to see someone out there answer the following
>>> question:
>>>
>>>
>>> Why has the co-occurrence of AN and EN been forbidden by -bidi ? I
>>> read that part of the document but didn't see anything other that
>>> visual confusion or possible re-arrangement of the label as the
>>> reason. If
>>> all visual confusions and character sequencing problems were solved by
>>> setting this rule, then it would make sense. However, note the following
>>> cases:
>>>
>>> 1. <ALEF>.3.com (as I stated before)
>>>
>> The current version of -bidi tries to say clearly that:
>> - use of a label that begins with a digit will cause confusion
>> - because of the interdiction against inter-label test, there is no
>> rule against it
>>
>> Wise people will put these two things together and choose to not use
>> <ALEF>.3.com.
>>> 2. <U+064A><U+0627>.com (
>>> http://www.nic.ir/Show_Text?c=%D9%8A%D8%A7&s=14&b=ffffff7f&f=01292200&t=DejaVuSans
>>> )    <U+6CC><U+0627>.com (
>>> http://www.nic.ir/Show_Text?c=%DB%8C%D8%A7&s=14&b=ffffff7f&f=01292200&t=DejaVuSans
>>> ) (visual confusion problem).
>>>
>> The first example is ARABIC LETTER YEH (BIDI class AL) and ARABIC
>> LETTER ALEF (bidi class AL).
>>
>> The second example is ARABIC LETTER FARSI YEH (BIDI class AL) and
>> ARABIC LETTER ALEF.
>>
>> This definitely has nothing to do with BIDI rules, since all the
>> letters are in class AL.
>> But this
>> How is this different from CYRILLIC LETTER A and LATIN LETTER A?
>>> Will the rules solve these ? Either -bidi or Context rules? Or should
>>> the
>>> registry still add further restrictions? Obviously the registry should.
>>> For these reasons, we believe that the case of numerals should not be
>>> treated any differently by -bidi. I think it is better to let
>>> registry decide how to deal with these kinds of problems. dotIR
>>> considers
>>> the possibility of having domains like <U+062C><U+06F5><U+0665>.ir . Why
>>> should such a domain be banned by the protocol?
>> ARABIC LETTER JEEM (AL), EXTENDED ARABIC-INDIC DIGIT FIVE (EN),
>> ARABIC-INDIC DIGIT FIVE (AN).
>>
>> From section 1.3.2 of "Rationale":
>>
>>   This distinction is important because the reasonable goal of an IDN
>>   effort is not to be able to write the great Klingon (or language of
>>   one's choice) novel in DNS labels but to be able to form a usefully
>>   broad range of mnemonics in ways that are as natural as possible in a
>>   very broad range of scripts.
>>
>> I know of no language / script where allowing this particular example
>> necessary to "form a usefully broad range of mnemonics".
>>
>> We know what's wrong with it (it causes problems). I have not heard a
>> compelling argument for its inclusion.
>>
>>                        Harald
>>
>>
>>
>>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>