Mixing of AN and EN (Re: Protocol-08 (and status of Defs-04 and Rationale-06))
Alireza Saleh
saleh at nic.ir
Thu Dec 11 22:48:06 CET 2008
Dear Harold,
In my examples, I didn't intend to send something that say the BIDI rule
fails.
In my email, I tried to demonstrate some examples that require the
registry to put some restrictions. So, the registry should still have
good knowledge about the languages which supposed to be supported by the
registry. The registry still needs to put some internal mappings or
restrictions such as what Vint suggested for 'Sharp S'. I think the
problems which may happen because of having both Arabic-indic and
extended-Arabic characters in the label, are somehow similar to my
examples. So what I can't understand is, why the protocol leaves
something to be resolved at the registry and something not. When there
is no intra-label check for a domain, Many sample domains may create by
users that pass the protocol rules. but when you look at them, it is
not possible to find out which characters and in which sequences have
been used in it. I think in this case, it is better to leave the visual
confusions to be handled out of the protocol. For example, in IDNA2003,
there was a rule in -bidi that restricts -bidi labels to have digits at
both ends. In IDNA2008, this rule is relaxed and the bidi-label can have
digit at the end. This change is very useful, but i think that it causes
many problems for the registries. for example, in dotIR, we have
rejected many requests that didn't meet IDNA2003 requirements but they
meet IDNA2008. As long as those requested labels were against the
protocol, we didn't keep them in our database. therefore, we not only
can't give them some sunrise if the policy changes in the future but
also we have to keep our current policy that denies digits at both ends.
Fortunately, we predict the possible permission of using ZWNJ later in
the protocol, so we kept the original domains that entered with ZWNJ.
Now, it is confusing. I don't know if there is any guarantee that the
modifications to the UNICODE BIDI algorithm, enable IDNA to permit
digits at the beginning or mixing Arabic-Indic with Extended-Arabic.
Dear Mark, as you know about the bug that my friend and I reported to
the UNICODE( N1 rule of TR9 report ), My concern is that the UTC may
become to an agreement that some modifications to the TR9 report of the
UTC are required.
Best Regards
Alireza
Harald Alvestrand wrote:
> Changing the subject when changing the subject is usually a good
> idea.....
>
> Alireza Saleh wrote:
>> I would sincerely like to see someone out there answer the following
>> question:
>>
>>
>> Why has the co-occurrence of AN and EN been forbidden by -bidi ? I
>> read that part of the document but didn’t see anything other that
>> visual confusion or possible re-arrangement of the label as the
>> reason. If
>> all visual confusions and character sequencing problems were solved by
>> setting this rule, then it would make sense. However, note the following
>> cases:
>>
>> 1. <ALEF>.3.com (as I stated before)
>>
> The current version of -bidi tries to say clearly that:
> - use of a label that begins with a digit will cause confusion
> - because of the interdiction against inter-label test, there is no
> rule against it
>
> Wise people will put these two things together and choose to not use
> <ALEF>.3.com.
>> 2. <U+064A><U+0627>.com (
>> http://www.nic.ir/Show_Text?c=%D9%8A%D8%A7&s=14&b=ffffff7f&f=01292200&t=DejaVuSans
>> ) <U+6CC><U+0627>.com (
>> http://www.nic.ir/Show_Text?c=%DB%8C%D8%A7&s=14&b=ffffff7f&f=01292200&t=DejaVuSans
>> ) (visual confusion problem).
>>
> The first example is ARABIC LETTER YEH (BIDI class AL) and ARABIC
> LETTER ALEF (bidi class AL).
>
> The second example is ARABIC LETTER FARSI YEH (BIDI class AL) and
> ARABIC LETTER ALEF.
>
> This definitely has nothing to do with BIDI rules, since all the
> letters are in class AL.
> But this
> How is this different from CYRILLIC LETTER A and LATIN LETTER A?
>> Will the rules solve these ? Either -bidi or Context rules? Or should
>> the
>> registry still add further restrictions? Obviously the registry should.
>> For these reasons, we believe that the case of numerals should not be
>> treated any differently by -bidi. I think it is better to let
>> registry decide how to deal with these kinds of problems. dotIR
>> considers
>> the possibility of having domains like <U+062C><U+06F5><U+0665>.ir . Why
>> should such a domain be banned by the protocol?
> ARABIC LETTER JEEM (AL), EXTENDED ARABIC-INDIC DIGIT FIVE (EN),
> ARABIC-INDIC DIGIT FIVE (AN).
>
> From section 1.3.2 of "Rationale":
>
> This distinction is important because the reasonable goal of an IDN
> effort is not to be able to write the great Klingon (or language of
> one's choice) novel in DNS labels but to be able to form a usefully
> broad range of mnemonics in ways that are as natural as possible in a
> very broad range of scripts.
>
> I know of no language / script where allowing this particular example
> necessary to "form a usefully broad range of mnemonics".
>
> We know what's wrong with it (it causes problems). I have not heard a
> compelling argument for its inclusion.
>
> Harald
>
>
>
>
More information about the Idna-update
mailing list