draft-liman-tld-names-00.txt and bidi

Alireza Saleh saleh at nic.ir
Sat Mar 7 13:44:18 CET 2009


I think the TLD should include restrictive bidi rules such as IDNA2003 
bidi document.

Alireza

Alireza Saleh wrote:
> In the TLD only ? I thought we had such text for each label.
>
> Alireza
>
> � wrote:
>   
>> Is it not the case that the safe bet is to say that "every character  
>> in the TLD of a U-label must have codepoints that are alphabetic, have  
>> the same directionality, either Left_to_Right or Right_to_Left, and be  
>> valid according to IDNA". And then anything we add to that makes  
>> things more and more scary?
>>
>>     Patrik
>>
>> On 7 mar 2009, at 12.02, Vint Cerf wrote:
>>
>>   
>>     
>>> Martin, et al,
>>>
>>> I would have thought that any notion of all-digit labels would be
>>> hazardous in the event they lead to confusion with dotted IP address
>>> notations and would therefore be forbidden?
>>>
>>> v
>>>
>>>
>>> Vint Cerf
>>> Google
>>> 1818 Library Street, Suite 400
>>> Reston, VA 20190
>>> 202-370-5637
>>> vint at google.com
>>>
>>>
>>>
>>>
>>> On Mar 7, 2009, at 1:43 AM, Martin Duerst wrote:
>>>
>>>     
>>>       
>>>> Hello Andrew,
>>>>
>>>> I'm replying to the list because I think there might be enough
>>>> people interested, at least tangentially.
>>>>
>>>> At 07:11 09/03/07, Andrew Sullivan wrote:
>>>>       
>>>>         
>>>>> Dear colleagues,
>>>>>
>>>>> This is slightly off-topic (although related), but I know some
>>>>> experts
>>>>> who have thought about this issue are here so I thought I'd better
>>>>> ask.
>>>>>
>>>>> Over on the DNSOP list, we're discussing draft-liman-tld-
>>>>> names-00.txt.
>>>>> One of the interesting arguments that has cropped up has to do with
>>>>> leading or ending digits on a label.
>>>>>
>>>>> Now, we have some recommendations (and restrictions) on labels in  
>>>>> the
>>>>> bidi document, but of course that is something that restricts IDNs,
>>>>> and not A-labels.
>>>>>
>>>>> The question that I have is whether there is a similar bidi issue  
>>>>> for
>>>>> A-labels (or, more importantly, non-IDN LDH labels: think of a label
>>>>> "123abc", for instance) in a bidi display or entry context.  I've
>>>>> been
>>>>> assuming "no" because we already have these sorts of labels today  
>>>>> and
>>>>> I imagined whatever is happening now would apply.  But it strikes me
>>>>> that we wouldn't be introducing bidi restrictions if there weren't
>>>>> already a problem.  So is there an issue here that might be relevant
>>>>> to the I-D in question?
>>>>>         
>>>>>           
>>>> You are right that there is a bidi issue. For some very specific
>>>> example, please see Example 11 at
>>>> http://www.w3.org/International/iri-edit/BidiExamples
>>>> (please read the legends or tooltips carefully).
>>>>
>>>> The reason why there are bidi issues is:
>>>> - Non-IDN labels turn up in IDNs
>>>> - Digits get close to RTL characters, maybe only separated by dots
>>>> - In the bidi algorithm, numbers and dots get associated with nearby
>>>> text and thrown around
>>>>
>>>> Digits between letters of the same directionality get insulated
>>>> from their surroundings, that's why IDNA2003 required RTL letters
>>>> at either end of a label containing RTL characters.
>>>> IDNA2003 did not requre labels with LTR characters to have LTR
>>>> characters at either end, simply because such labels were already
>>>> out there, and because IDNA wasn't in charge of ASCII-only labels.
>>>> However, that doesn't mean that we wouldn't have wanted to prohibit
>>>> them if we had been able to.
>>>>
>>>> For IDNA2008, the situation is slightly different. As far as I
>>>> understand, it doesn't prohibit specific labels, just combinations
>>>> of labels that can cause visual havoc. That means that in some
>>>> situations, a digit at the end of an RTL label may be allowed.
>>>> Turned the other way, it would mean that non-IDN TLDs could be
>>>> created quite freely, but some of them (e.g. those starting with
>>>> a digit) may not allow a second-level RTL label. The details of
>>>> what the restrictions are would have to be calculated using
>>>> Harald's approach. The question of how this is enforced would
>>>> have to be sorted out by people who engage in this kind of thing.
>>>>
>>>>
>>>>       
>>>>         
>>>>> Note that the I-D in question as it stands will not allow all- 
>>>>> numeric
>>>>> labels.  But there is a thread of argument that all-numeric labels
>>>>> such as "666" ought to be allowed, on the grounds that such a label
>>>>> could never be part of an IPv4 address anyway.
>>>>>         
>>>>>           
>>>> As far as my experience with Bidi goes, all-numeric labels
>>>> won't be significantly worse than labels with digits at
>>>> either or both ends. What happens in detail may be slightly
>>>> different, but bad things will happen either way.
>>>> There may even be cases where all-digit labels 'perform'
>>>> better, because the digits will stay together and so there
>>>> will be no "jumping the dot" phenomenon for parts of a label.
>>>> But there may still be visual havoc for the overall order
>>>> of labels, and very important, different domain names may
>>>> still lead to the same visual representation (because of
>>>> the bidi reordering). For that, please see Example 10
>>>> in the above page; a logical
>>>>  http://ab.123.CDEFGH/kl/mn/op.html
>>>> will be displayed also as
>>>>  http://ab.123.HGFEDC/kl/mn/op.html,
>>>> same as the logical
>>>>  http://ab.CDEFGH.123/kl/mn/op.html
>>>> in the example.
>>>>
>>>>       
>>>>         
>>>>> If there are bidi
>>>>> issues that are important, then the "no leading digit" rule in the
>>>>> I-D
>>>>> is strengthened.
>>>>>         
>>>>>           
>>>> It's not only leading digits. It's also trailing digits.
>>>> Trailing digits don't affect standalone domain names
>>>> (or so I think), but domain names often appear in context,
>>>> the most frequent of which is an IRI/URI. The issues here
>>>> are then very much the same as for domain names only, you
>>>> can read about them in the IRI spec (RFC 3987), Section 4.
>>>> The approach taken there is the same as for IDNA2003, but
>>>> instead of 'label', the term 'component' is used in order
>>>> to be more generic. Also, there are no MUSTs, only SHOULDs,
>>>> because it's impossible for IRIs to dictate how their
>>>> components are formed.
>>>>
>>>> We plan to adapt the bidi section of RFC 3987 once IDNA2008
>>>> is more stable.
>>>>
>>>>
>>>> As a summary, from a bidi viewpoint, digits at both ends
>>>> of a TLD label should be prohibited while they still can.
>>>>
>>>> Regards,    Martin.
>>>>
>>>>       
>>>>         
>>>>> Since this isn't strictly on-topic, please send replies off-list.
>>>>> Thanks, and sorry for the diversion.
>>>>>
>>>>> A
>>>>>
>>>>> -- 
>>>>> Andrew Sullivan
>>>>> ajs at shinkuro.com
>>>>> Shinkuro, Inc.
>>>>> _______________________________________________
>>>>> Idna-update mailing list
>>>>> Idna-update at alvestrand.no
>>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>>>         
>>>>>           
>>>> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>>>> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp
>>>>
>>>> _______________________________________________
>>>> Idna-update mailing list
>>>> Idna-update at alvestrand.no
>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>>       
>>>>         
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>
>>>     
>>>       
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>   
>>     
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>   



More information about the Idna-update mailing list