comments on draft-ietf-idnabis-bidi

Erik van der Poel erikv at google.com
Tue Aug 4 15:09:34 CEST 2009


CS and ET certainly are two of the more noticeable differences between
Mati's proposal and the expired draft, but I don't know whether they
are so beneficial, since they are mostly (all?) punctuation and symbol
characters that are not allowed anyway.

Similarly, the rule that allows EN followed by ET at the end of a
label may not be so beneficial due to the prohibition of punctuation
and symbol characters.

Apart from these minor issues, I think Mati's proposal deserves
serious consideration, even though it has been made at this very late
stage.

I did notice that the new rules appear to place more restrictions on
LTR labels than RTL labels when there is at least one RTL label in the
domain name, but I can't tell whether that would be a problem in
practice. If you like, I can check some real-world domain names
against these rules, but I have to caution everyone that the number of
existing bidi domain names may not be very large.

Erik

On Tue, Aug 4, 2009 at 3:23 AM, Vint Cerf<vint at google.com> wrote:
> thanks erik that is most helpful.
>
> any comments on the EN/CS interior question that Harald raised?
>
> v
>
> On Aug 3, 2009, at 8:25 PM, Erik van der Poel wrote:
>
>> I have tested this new set of rules with domain names up to 9
>> characters and they work for both the Label Uniqueness and Character
>> Grouping requirements.
>>
>> Erik
>>
>> On Mon, Aug 3, 2009 at 4:34 AM, Harald Tveit
>> Alvestrand<harald at alvestrand.no> wrote:
>>>
>>> Matitiahu Allouche skrev:
>>>>
>>>> In my previous suggestions, I did not take in consideration that the
>>>> rules
>>>> are meant to codify also labels which do not contain any RTL characters.
>>>> Having understood that, here is an updated version of my suggestions:
>>>>
>>>> Definitions:
>>>>
>>>> 1. Bidi domain names are domain names which include at least one RTL
>>>> label.
>>>>
>>>> 2. A RTL label is a label which contains at least one character of type
>>>> R
>>>> or AL or AN.
>>>>
>>>> Rules for RTL labels in Bidi domain names:
>>>>
>>>>   1.  Only characters with the BIDI properties R, AL, AN, EN, ES,
>>>>       CS, ET, ON, BN and NSM are allowed in RTL labels.
>>>>
>>>>   2.  The first position must be a character with Bidi property R or AL.
>>>>
>>>>   3.  The last position must be a character with Bidi property R, AL, EN
>>>>       or AN, followed by zero or more NSM.
>>>>
>>>>   4.  If an EN is present, no AN may be present, and vice versa.
>>>>
>>>>
>>>> Rules for non-RTL labels in Bidi domain names:
>>>>
>>>>   1.  Only characters with the BIDI properties L, EN, ES,
>>>>       CS, ET, ON and NSM are allowed in non-RTL labels.
>>>>
>>>>   2.  The first position must be a character with Bidi property L.
>>>>
>>>>   3.  The last position must be a character with Bidi property L or EN,
>>>>       followed by zero or more NSM, or the two last positions must be
>>>>       EN followed by ET.
>>>>
>>>>
>>>>
>>> Thank you again - I have now implemented this algorithm and compared the
>>> result for the "Character Grouping Requirement" up to a length of 3
>>> characters (my perl code is chugging on longer strings as we speak).
>>>
>>> I hope Erik can take a look at the "Label Uniqueness Requirement", which
>>> I don't have code to test for.
>>>
>>> The difference between the two algorithms seems to be that your proposal
>>> allows CS and ET within a label, but not at the ends. Was this an
>>> intentional difference?
>>>
>>>                 Harald
>>>
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>


More information about the Idna-update mailing list