IDNAbis spec

Erik van der Poel erikv at google.com
Wed Nov 4 14:31:49 CET 2009


There are several different operations that you can perform on the
labels of a domain name, and these operations occur at different
times. Here are just a few examples:

(1) single-label registration time
(2) multi-label DNAME definition time
(3) multi-label domain name lookup time
(4) multi-label domain name display time

idnabis-protocol-17 focuses on (1) and (3). For (1), it says:

"If the proposed label contains any characters that are written from
right to left it MUST meet the BIDI criteria [IDNA2008-BIDI]."

Note that the above is talking about a single label. For (3), the protocol says:

"Verification that the string is compliant with the requirements for
right to left characters, specified in [IDNA2008-BIDI]."

Note that the above is talking about a "string", which presumably
might contain more than one label. As far as I can tell, IDNAbis does
not say much about operations (2) and (4) for bidi.

idnabis-bidi-06 says:

'A "BIDI domain name" is a domain name that contains at least one RTL
label.' and

'The following rule, consisting of six conditions, applies to labels
in BIDI domain names.'

Clearly, the above is talking about multi-label domain names. However,
the rule itself tells you how to test a single label, so that part of
the spec can be used at registration time (1).

Let's take an example. One of our favorite examples is 3com.com. At
registration time, when we are registering the label "3com", there is
no way of knowing that someone may, at some point in the future,
define a DNAME that breaks the IDNAbis bidi rules. Since there is no
way of knowing that, the registration is simply allowed.

Later, someone tries to define a DNAME, say, HEBREW.3com.com where
HEBREW is a string of right-to-left Hebrew characters. At this point,
the implementation might choose to check the IDNAbis bidi rules and
either reject the DNAME or emit a warning about it if it breaks the
rules.

Even later, someone tries to lookup HEBREW.3com.com. The
implementation can check the entire domain name against the IDNAbis
bidi rules. It does not have to check since the protocol says
"SHOULD".

Yet later again, someone tries to display HEBREW.3com.com. The
implementation probably should check against the IDNAbis bidi rules.
If the domain name breaks the rules, the implementation can refuse to
display it in Unicode form (choosing Punycode instead), or produce a
warning of some kind.

So, the IDNAbis drafts are in some sense incomplete, since they don't
fully address DNAME time (2) and display time (4). But if the WG does
start to discuss these, you can imagine what my position is going to
be.

Erik

On Wed, Nov 4, 2009 at 2:11 AM, Vint Cerf <vint at google.com> wrote:
> the question of inter-label or cross-label testing was extensively
> discussed on the WG list and rejected as overly complex at the
> protocol level. As with a number of cases, the WG concluded that the
> registry or registrar had to be cognizant of this kind of anomaly and
> reject problematic registration requests.
>
> v
>
> On Nov 3, 2009, at 11:27 PM, Abdulrahman I. ALGhadir wrote:
>
>> " Which of the above examples represent L1.R1.R2.L2 ? These cases
>> require
>> inter-label checking and the working group came to the consensus
>> that do
>> not perform such tests."
>>
>> Well don't take my words literally L1.R1.R2.L2 was an example of
>> case where mixing of labels with different directions will yield
>> disorder in their appearance
>> e.g. " حسني.computer.شركة" follow this case.
>>
>> AbdulRahman,
>>
>> -----Original Message-----
>> From: Alireza Saleh [mailto:saleh at nic.ir]
>> Sent: 3/Nov/2009 7:09 PM
>> To: Abdulrahman I. ALGhadir
>> Cc: Lisa Dusseault; idna-update at alvestrand.no; muhtaseb at kfupm.edu.sa
>> Subject: Re: IDNAbis spec
>>
>> Abdulrahman I. ALGhadir wrote:
>>> Hey,
>>>
>>>
>>>
>>> [Quote]: “what if we allow diacritics on the domain name then a
>>> domain name like
>>>
>>> مايكروسوفت.شركة
>>>
>>> Will be different than the
>>>
>>> مَايكروسوفت.شركة
>>>
>>> Because in the second one there is a diacritic on the first letter.
>>>
>>> Although this diacritic is implicit in the first one.
>>>
>>> So this might cause a lot of problems in the domain names
>>> registration and owner claims.” [/Quote]
>>>
>>>
>> I think this is the registry ( Zone owner ) decision to allow or deny
>> the usage of certain characters including diacritics, however
>> diacritics
>> are part of the some languages. There may be characters ( not
>> necessary
>> diacritics ) in a languages that using them may cause problems, in
>> these
>> cases the registry can decide to remove those characters from the
>> character repertoire for that language.
>>>
>>>
>>> Well this has been answered in “NSM flow?”
>>>
>>>
>>>
>>> [Quote]
>>>
>>>  “Moreover, for the displaying order of the labels of a domain
>>> name I have tried the following hypothetical domain names:
>>>
>>>
>>> Husni.حاسب.شركة
>>> حسني.حاسب.شركة
>>> husni.حاسب.com
>>> حسني.computer.شركة
>>> حسني.حاسب.com
>>> husni.computer.شركة
>>> husni.computer.com
>>>
>>> The following is an image of the network order from right to left
>>> for Arabic of the above:
>>>
>>>
>>> It is clear that when we use two consecutive RTL labels separated
>>> by dots and followed by one LTR label the display order does not
>>> look as it
>>> should. The same is true that when we use two consecutive LTR
>>> labels separated by dots and followed by one RTL. The question is
>>> should we allow such confusion?”[/Quote]
>>>
>>> from draft-ietf-idnabis-bidi-06
>>>
>>> [Quote]
>>>
>>> “   o  The sequence of labels should be consistent with network
>>> order.
>>>
>>>      This proved impossible - a domain name consisting of the labels
>>>
>>>      (in network order) L1.R1.R2.L2 will be displayed as
>>> L1.R2.R1.L2 in
>>>
>>>      an LTR context.  (In an RTL context, it will be displayed as
>>>
>>>      L2.R2.R1.L1).”
>>>
>>> [/Quote]
>>>
>>>
>>>
>>> Well this problem was expected to happen, IDNA uses a UAX#9 Bidi
>>> algorithm version-like where some rules have been removed.
>>>
>> Which of the above examples represent L1.R1.R2.L2 ? These cases
>> require
>> inter-label checking and the working group came to the consensus
>> that do
>> not perform such tests.
>>
>> -- Alireza
>>
>>
>> -----------------------------------------------------------------------------------
>> Disclaimer:
>> This message and its attachment, if any, are confidential and may
>> contain legally
>> privileged information. If you are not the intended recipient,
>> please contact the
>> sender immediately and delete this message and its attachment, if
>> any, from your
>> system. You should not copy this message or disclose its contents to
>> any other
>> person or use it for any purpose. Statements and opinions expressed
>> in this e-mail
>> are those of the sender, and do not necessarily reflect those of the
>> Communications
>> and Information Technology Commission (CITC). CITC accepts no
>> liability for damage
>> caused by this email.
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>


More information about the Idna-update mailing list