draft-liman-tld-names-00.txt and bidi

Alireza Saleh saleh at nic.ir
Sat Mar 7 13:32:29 CET 2009


In the TLD only ? I thought we had such text for each label.

Alireza

� wrote:
> Is it not the case that the safe bet is to say that "every character  
> in the TLD of a U-label must have codepoints that are alphabetic, have  
> the same directionality, either Left_to_Right or Right_to_Left, and be  
> valid according to IDNA". And then anything we add to that makes  
> things more and more scary?
>
>     Patrik
>
> On 7 mar 2009, at 12.02, Vint Cerf wrote:
>
>   
>> Martin, et al,
>>
>> I would have thought that any notion of all-digit labels would be
>> hazardous in the event they lead to confusion with dotted IP address
>> notations and would therefore be forbidden?
>>
>> v
>>
>>
>> Vint Cerf
>> Google
>> 1818 Library Street, Suite 400
>> Reston, VA 20190
>> 202-370-5637
>> vint at google.com
>>
>>
>>
>>
>> On Mar 7, 2009, at 1:43 AM, Martin Duerst wrote:
>>
>>     
>>> Hello Andrew,
>>>
>>> I'm replying to the list because I think there might be enough
>>> people interested, at least tangentially.
>>>
>>> At 07:11 09/03/07, Andrew Sullivan wrote:
>>>       
>>>> Dear colleagues,
>>>>
>>>> This is slightly off-topic (although related), but I know some
>>>> experts
>>>> who have thought about this issue are here so I thought I'd better
>>>> ask.
>>>>
>>>> Over on the DNSOP list, we're discussing draft-liman-tld-
>>>> names-00.txt.
>>>> One of the interesting arguments that has cropped up has to do with
>>>> leading or ending digits on a label.
>>>>
>>>> Now, we have some recommendations (and restrictions) on labels in  
>>>> the
>>>> bidi document, but of course that is something that restricts IDNs,
>>>> and not A-labels.
>>>>
>>>> The question that I have is whether there is a similar bidi issue  
>>>> for
>>>> A-labels (or, more importantly, non-IDN LDH labels: think of a label
>>>> "123abc", for instance) in a bidi display or entry context.  I've
>>>> been
>>>> assuming "no" because we already have these sorts of labels today  
>>>> and
>>>> I imagined whatever is happening now would apply.  But it strikes me
>>>> that we wouldn't be introducing bidi restrictions if there weren't
>>>> already a problem.  So is there an issue here that might be relevant
>>>> to the I-D in question?
>>>>         
>>> You are right that there is a bidi issue. For some very specific
>>> example, please see Example 11 at
>>> http://www.w3.org/International/iri-edit/BidiExamples
>>> (please read the legends or tooltips carefully).
>>>
>>> The reason why there are bidi issues is:
>>> - Non-IDN labels turn up in IDNs
>>> - Digits get close to RTL characters, maybe only separated by dots
>>> - In the bidi algorithm, numbers and dots get associated with nearby
>>> text and thrown around
>>>
>>> Digits between letters of the same directionality get insulated
>>> from their surroundings, that's why IDNA2003 required RTL letters
>>> at either end of a label containing RTL characters.
>>> IDNA2003 did not requre labels with LTR characters to have LTR
>>> characters at either end, simply because such labels were already
>>> out there, and because IDNA wasn't in charge of ASCII-only labels.
>>> However, that doesn't mean that we wouldn't have wanted to prohibit
>>> them if we had been able to.
>>>
>>> For IDNA2008, the situation is slightly different. As far as I
>>> understand, it doesn't prohibit specific labels, just combinations
>>> of labels that can cause visual havoc. That means that in some
>>> situations, a digit at the end of an RTL label may be allowed.
>>> Turned the other way, it would mean that non-IDN TLDs could be
>>> created quite freely, but some of them (e.g. those starting with
>>> a digit) may not allow a second-level RTL label. The details of
>>> what the restrictions are would have to be calculated using
>>> Harald's approach. The question of how this is enforced would
>>> have to be sorted out by people who engage in this kind of thing.
>>>
>>>
>>>       
>>>> Note that the I-D in question as it stands will not allow all- 
>>>> numeric
>>>> labels.  But there is a thread of argument that all-numeric labels
>>>> such as "666" ought to be allowed, on the grounds that such a label
>>>> could never be part of an IPv4 address anyway.
>>>>         
>>> As far as my experience with Bidi goes, all-numeric labels
>>> won't be significantly worse than labels with digits at
>>> either or both ends. What happens in detail may be slightly
>>> different, but bad things will happen either way.
>>> There may even be cases where all-digit labels 'perform'
>>> better, because the digits will stay together and so there
>>> will be no "jumping the dot" phenomenon for parts of a label.
>>> But there may still be visual havoc for the overall order
>>> of labels, and very important, different domain names may
>>> still lead to the same visual representation (because of
>>> the bidi reordering). For that, please see Example 10
>>> in the above page; a logical
>>>  http://ab.123.CDEFGH/kl/mn/op.html
>>> will be displayed also as
>>>  http://ab.123.HGFEDC/kl/mn/op.html,
>>> same as the logical
>>>  http://ab.CDEFGH.123/kl/mn/op.html
>>> in the example.
>>>
>>>       
>>>> If there are bidi
>>>> issues that are important, then the "no leading digit" rule in the
>>>> I-D
>>>> is strengthened.
>>>>         
>>> It's not only leading digits. It's also trailing digits.
>>> Trailing digits don't affect standalone domain names
>>> (or so I think), but domain names often appear in context,
>>> the most frequent of which is an IRI/URI. The issues here
>>> are then very much the same as for domain names only, you
>>> can read about them in the IRI spec (RFC 3987), Section 4.
>>> The approach taken there is the same as for IDNA2003, but
>>> instead of 'label', the term 'component' is used in order
>>> to be more generic. Also, there are no MUSTs, only SHOULDs,
>>> because it's impossible for IRIs to dictate how their
>>> components are formed.
>>>
>>> We plan to adapt the bidi section of RFC 3987 once IDNA2008
>>> is more stable.
>>>
>>>
>>> As a summary, from a bidi viewpoint, digits at both ends
>>> of a TLD label should be prohibited while they still can.
>>>
>>> Regards,    Martin.
>>>
>>>       
>>>> Since this isn't strictly on-topic, please send replies off-list.
>>>> Thanks, and sorry for the diversion.
>>>>
>>>> A
>>>>
>>>> -- 
>>>> Andrew Sullivan
>>>> ajs at shinkuro.com
>>>> Shinkuro, Inc.
>>>> _______________________________________________
>>>> Idna-update mailing list
>>>> Idna-update at alvestrand.no
>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>>         
>>> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>>> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp
>>>
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>       
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>     
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>   



More information about the Idna-update mailing list