comments on draft-ietf-idnabis-bidi

Thu Feb 12 21:53:14 CET 2009

I didn't mean to say that your rules are not suffiecient, what I'm 
trying to say is that both your suggested restrictions and the current 
bidi restrictions are much more that required fo limited number of 
scripts which going to be supported by a single TLD, however by having 
very restrictive rules the possibility of happening the visual confusion 
exists. I'm trying to say that the rules for preventing visual 
confusions can be defined by TLD such as other policies because any 
introduced  IDN TLD is going to support limited number of scripts and 
languages. For example if you see الف۱۲۳.ایران and الف١٢٣.ایران you 
cannot tell which Arabic number set has been used according to the 
protocol but the TLD should clarify it.
So now that this clarifications are required by TLD, why we cannot trust 
the registry to define all clarifications for supporting scripts and 
languages.

Best
Alireza

Matitiahu Allouche wrote:
>
>    Hello, Alireza!
>
> Thank you for your input. However I am afraid that I did not fully 
> grasp what you meant.
>
> Can you give examples where there can be visual confusion while 
> passing the bidi rules that I proposed?
>
> Can you suggest alternate rules instead of those that I proposed?
>
> I am looking forward to better understand your point of view.
>
> Shalom (Regards),  Mati
>           Bidi Architect
>           Globalization Center Of Competency - Bidirectional Scripts
>           IBM Israel
>           Phone: +972 2 5888802    Fax: +972 2 5870333    Mobile: +972 
> 52 2554160
>
>
>
> *Alireza Saleh <saleh at nic.ir>*
>
> 10/02/2009 19:27
>
> 	
> To
> 	Matitiahu Allouche/Israel/IBM at IBMIL
> cc
> 	Vint Cerf <vint at google.com>, idna-update at alvestrand.no
> Subject
> 	Re: comments on draft-ietf-idnabis-bidi
>
>
>
> 	
>
>
>
>
>
> Dear Mati,
>
>
> Thanks for your comments. Your suggestion will lead the BIDI draft to
> put more restrict rules for languages using -bidi characters. As long as
> there is no intra label checks in the protocol documents, character
> re-ordering and visual confiscations are possible.Consider that we have
> L,R,AN,EN,N character properties and there are some rules which intend
> to make the world safe in this situation. What does happen if some one
> sees these rules in the absence of  some character properties ? I think
> it seems very restrictive and unusable in that case. I think that it is
> very rare for a TLD  to support characters in all properties. For
> instance,  I think that having an Arabic-Script label under an ASCII TLD
> or Hebrew TLD will be strange enough to make users be more careful about
> what they are browsing.  What I suggest as an approach for the protocol
> documents is to keep some basic requirements and let the registries
> decide about the details.
>
>
> > � 3 variant. �The last position must be a character with Bidi
> property R,
> > � � �AL, EN or AN, followed by zero or more NSM.
>
>
> There are number of examples that can cause visual confusions as I
> stated earlier which also pass the current -bidi rules.
>
>
>
> Best
>
> Alireza
>
>
> Vint Cerf wrote:
>
> > thanks for these precise comments, Mati.
> >
> > Harald, I hope you can assess and incorporate as appropriate into a
> > revised draft.
> >
> > vint
> >
> >
> > Vint Cerf
> > Google
> > 1818 Library Street, Suite 400
> > Reston, VA 20190
> > 202-370-5637
> > vint at google.com <mailto:vint at google.com>
> >
> >
> >
> >
> > On Feb 10, 2009, at 3:28 AM, Matitiahu Allouche wrote:
> >
> >>
> >> My attention was recently drawn to the subject document (version 03)
> >> and I have a number of comments. �Some of them are very minor (typos,
> >> editorial) and reflect my pedantic mind, but I thought that I could
> >> as well help improve the form of the document. �Other comments touch
> >> more to the essence, and I will appreciate considering them seriously.
> >>
> >> 1) In section 2, first paragraph, "satisifes" should be "satisfies".
> >>
> >> 2) Section 2, rule 1 mentions the "Character Grouping requirement"
> >> for the first time in the document. �Either there should be a forward
> >> reference to section 3 where it will be explained, or (better, in my
> >> opinion), the content of the current section 3 should precede the
> >> content of the current section 2.
> >>
> >> 3) In the sentence "ET is excluded because the string L ET does not
> >> satisfy the Character Grouping requirement.", "L" seems to represent
> >> a label, but can easily be confused with the L Bidi property (all the
> >> more since it is adjacent to ET which surely represents a character
> >> with the ET Bidi property).
> >>
> >> 4) In the sentence "CS is excluded because the string L CS does not
> >> satisfy the Character Grouping requirement.", "L" seems to represent
> >> a label, but can easily be confused with the L Bidi property (all the
> >> more since it is adjacent to CS which surely represents a character
> >> with the CS Bidi property).
> >>
> >> 5) I see no reason why CS is excluded while ES is allowed. �Both can
> >> be the source of the same kind of �violation of the Character
> >> Grouping requirement. �ES characters are excluded from the first and
> >> last positions by rules 2 and 3. �With the same restrictions
> >> (exclusion from the first and last positions), ES and ET characters
> >> can be allowed and will not violate the Character Grouping
> >> requirement any more than ES characters.
> >>
> >> 6) In section 1.1, there appears the following statement: "This
> >> specification is not intended to place any requirements on domain
> >> names that do not contain right-to-left characters."
> >> Also the title of section 2 is "A replacement for the RFC 3454 BIDI
> >> rule" which implies that the text only deals with "Bidi" labels.
> >> If that means that the specification applies only to labels which
> >> contain at least one character with Bidi property R, AL or AN, and we
> >> combine that with rule 4 "If an R, AL or AN is present, no L may be
> >> present.", then an L character can never be part of a Bidi label, and
> >> the L should be removed from the list of allowed Bidi properties in
> >> rule 1.
> >>
> >> 7) In [UAX9], rule X9 says that BN characters must be removed from
> >> the displayed text. �Any such invisible character violates the Label
> >> Uniqueness requirement. �BN characters must not be allowed by rule 1.
> >>
> >> 8) From rules 1, 2, 4, 6 and 7, plus our comments 6 and 7 above, it
> >> results that the first character of a Bidi label can only be of type
> >> R or AL. �Such a statement can advantageously replace rules 2, 6 and 7.
> >>
> >> 9) Rule 5 includes no justification. �While a mixture of AN and EN
> >> characters in the same label seems odd and not required in real life
> >> situations, it is not clear what requirement would be violated by
> >> such a combination.
> >>
> >> 10) The rules allow AN or EN digits to appear in the last position of
> >> a label (in opposition to RFC 3454). �Let us consider the following
> >> examples (where lower case letters represent L characters and upper
> >> case letters represent R or AL characters):
> >>
> >> � �a. network order = "ABC123.456xyz" �display order (LTR) =
> >> "123.456CBAxyz" �display order (RTL) = "123.456xyzCBA"
> >>
> >> � �b. network order = "ABC.456-xyz" �display order (LTR) =
> >> "456.CBA-xyz" �display order (RTL) = "xyz-456.CBA"
> >>
> >> � �c. network order = "ABC123.456.xyz" �display order (LTR) =
> >> "123.456CBA.xyz" �display order (RTL) = "xyz.123.456CBA"
> >>
> >> � �d. network order = "ABC.456.xyz" �display order (LTR) =
> >> "456.CBA.xyz" �display order (RTL) = "xyz.456.CBA"
> >>
> >> Examples a, b and c show very ugly violations of the Character
> >> Grouping requirement. �Since the document does not place requirements
> >> on non-Bidi labels, any non-Bidi label starting with digits following
> >> a Bidi label will cause a Character Grouping violation. �If Bidi
> >> labels are restricted from ending with digits (optionally followed by
> >> NSMs), then non-Bidi labels which contain only digits (example d)
> >> following a Bidi label will not cause a Character Grouping violation.
> >> Whether this modest benefit justifies imposing such a restriction is
> >> subject to discussion.
> >>
> >> 11) Towards the end of section 2, there appears the following
> >> sentence: "In a domain name consisting of only labels that pass the
> >> test, the requirements of Section 3 are satisfied."
> >> This is not true for domain names like in the examples above, unless
> >> non-Bidi labels are excluded, which is a very hard constraint.
> >>
> >> 12) The next sentence says: "In a domain name consisting of only
> >> LDH-labels and labels that pass the test, the requirements of Section
> >> 3 are satisfied as long as a label that starts with an ASCII digit
> >> does not come after a right-to-left label that ends in a digit."
> >> This is not true. �See example b above.
> >>
> >> 13) In section 3, there appears the sentence: "the label "123-456"
> >> will have a different display order in an RTL context than in a LTR
> >> context."
> >> This is not true, IMHO. �If the last letter before the label is not
> >> an Arabic Letter, it will be displayed as "123-456" both in LTR and
> >> RTL context. �If it is an Arabic Letter, it will be displayed as
> >> "456-123".
> >>
> >> 14) In section 3, there appears the sentence: "The Label Uniqueness
> >> property should hold true between LTR paragraphs and RTL paragraphs.
> >> �This was shown to be unsound."
> >> In fact, in all cases where Character Grouping and Label Uniqueness
> >> are satisfied for each paragraph direction separately, there will be
> >> Label Uniqueness between LTR and RTL paragraphs.
> >>
> >> 15) In section 3, since an "unproblematic label" can be a label which
> >> satisfies the requirements, the clause "any label S1 and S2 that is
> >> either a label satisfying the requirements or an unproblematic label"
> >> can be shortened to "any label S1 and S2 that is an unproblematic
> >> label".
> >>
> >> 16) In the formal statement of the Label Uniqueness requirement,
> >> there is no provision (or exclusion) for the case where L and L' are
> >> identical.
> >>
> >> 17) In summary I suggest that the rules in section 2 should be
> >> reformulated as below.
> >>
> >> � �1. �Only characters with the BIDI properties R, AL, AN, EN, ES,
> >> � � � CS, ET, ON and NSM are allowed in RTL labels.
> >>
> >> � 2. �The first position must be a character with Bidi property R 
> or AL.
> >>
> >> � 3. �The last position must be a character with Bidi property R or AL,
> >> � � � �followed by zero or more NSM.
> >>
> >> � 3 variant. �The last position must be a character with Bidi
> >> property R,
> >> � � �AL, EN or AN, followed by zero or more NSM.
> >>
> >> � 4 (debatable). �If an EN is present, no AN may be present, and vice
> >> � � � versa.
> >>
> >> It can be seen that this formulation is quite close to that in RFC
> >> 3454, while solving all the problems that the subject document aims
> >> to solve.
> >>
> >>
> >> Shalom (Regards), �Mati
> >> � � � � � Bidi Architect
> >> � � � � � Globalization Center Of Competency - Bidirectional Scripts
> >> � � � � � IBM Israel
> >> � � � � � Phone: +972 2 5888802 � �Fax: +972 2 5870333 � �Mobile:
> >> +972 52 2554160
> >> _______________________________________________
> >> Idna-update mailing list
> >> Idna-update at alvestrand.no <mailto:Idna-update at alvestrand.no>
> >> http://www.alvestrand.no/mailman/listinfo/idna-update
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> >  
>
>