draft-liman-tld-names-00.txt and bidi

Lyman Chapin lyman at acm.org
Sat Mar 7 16:43:17 CET 2009


At this point, what you say is precisely what the ICANN "applicant  
guidebook" and the explanatory memorandum that deals with string  
criteria will say about new gTLDs (assuming that what you meant was  
"every character in the U-label of a TLD..."). The only remaining  
uncertainty is what "IDNA" will mean in the criterion "valid  
according to IDNA" :-)

- Lyman

On Mar 7, 2009, at 6:39 AM, Patrik Fältström wrote:

> Is it not the case that the safe bet is to say that "every character
> in the TLD of a U-label must have codepoints that are alphabetic, have
> the same directionality, either Left_to_Right or Right_to_Left, and be
> valid according to IDNA". And then anything we add to that makes
> things more and more scary?
>     Patrik
> On 7 mar 2009, at 12.02, Vint Cerf wrote:
>> Martin, et al,
>> I would have thought that any notion of all-digit labels would be
>> hazardous in the event they lead to confusion with dotted IP address
>> notations and would therefore be forbidden?
>> v
>> Vint Cerf
>> Google
>> 1818 Library Street, Suite 400
>> Reston, VA 20190
>> 202-370-5637
>> vint at google.com
>> On Mar 7, 2009, at 1:43 AM, Martin Duerst wrote:
>>> Hello Andrew,
>>> I'm replying to the list because I think there might be enough
>>> people interested, at least tangentially.
>>> At 07:11 09/03/07, Andrew Sullivan wrote:
>>>> Dear colleagues,
>>>> This is slightly off-topic (although related), but I know some
>>>> experts
>>>> who have thought about this issue are here so I thought I'd better
>>>> ask.
>>>> Over on the DNSOP list, we're discussing draft-liman-tld-
>>>> names-00.txt.
>>>> One of the interesting arguments that has cropped up has to do with
>>>> leading or ending digits on a label.
>>>> Now, we have some recommendations (and restrictions) on labels in
>>>> the
>>>> bidi document, but of course that is something that restricts IDNs,
>>>> and not A-labels.
>>>> The question that I have is whether there is a similar bidi issue
>>>> for
>>>> A-labels (or, more importantly, non-IDN LDH labels: think of a  
>>>> label
>>>> "123abc", for instance) in a bidi display or entry context.  I've
>>>> been
>>>> assuming "no" because we already have these sorts of labels today
>>>> and
>>>> I imagined whatever is happening now would apply.  But it  
>>>> strikes me
>>>> that we wouldn't be introducing bidi restrictions if there weren't
>>>> already a problem.  So is there an issue here that might be  
>>>> relevant
>>>> to the I-D in question?
>>> You are right that there is a bidi issue. For some very specific
>>> example, please see Example 11 at
>>> http://www.w3.org/International/iri-edit/BidiExamples
>>> (please read the legends or tooltips carefully).
>>> The reason why there are bidi issues is:
>>> - Non-IDN labels turn up in IDNs
>>> - Digits get close to RTL characters, maybe only separated by dots
>>> - In the bidi algorithm, numbers and dots get associated with nearby
>>> text and thrown around
>>> Digits between letters of the same directionality get insulated
>>> from their surroundings, that's why IDNA2003 required RTL letters
>>> at either end of a label containing RTL characters.
>>> IDNA2003 did not requre labels with LTR characters to have LTR
>>> characters at either end, simply because such labels were already
>>> out there, and because IDNA wasn't in charge of ASCII-only labels.
>>> However, that doesn't mean that we wouldn't have wanted to prohibit
>>> them if we had been able to.
>>> For IDNA2008, the situation is slightly different. As far as I
>>> understand, it doesn't prohibit specific labels, just combinations
>>> of labels that can cause visual havoc. That means that in some
>>> situations, a digit at the end of an RTL label may be allowed.
>>> Turned the other way, it would mean that non-IDN TLDs could be
>>> created quite freely, but some of them (e.g. those starting with
>>> a digit) may not allow a second-level RTL label. The details of
>>> what the restrictions are would have to be calculated using
>>> Harald's approach. The question of how this is enforced would
>>> have to be sorted out by people who engage in this kind of thing.
>>>> Note that the I-D in question as it stands will not allow all-
>>>> numeric
>>>> labels.  But there is a thread of argument that all-numeric labels
>>>> such as "666" ought to be allowed, on the grounds that such a label
>>>> could never be part of an IPv4 address anyway.
>>> As far as my experience with Bidi goes, all-numeric labels
>>> won't be significantly worse than labels with digits at
>>> either or both ends. What happens in detail may be slightly
>>> different, but bad things will happen either way.
>>> There may even be cases where all-digit labels 'perform'
>>> better, because the digits will stay together and so there
>>> will be no "jumping the dot" phenomenon for parts of a label.
>>> But there may still be visual havoc for the overall order
>>> of labels, and very important, different domain names may
>>> still lead to the same visual representation (because of
>>> the bidi reordering). For that, please see Example 10
>>> in the above page; a logical
>>>  http://ab.123.CDEFGH/kl/mn/op.html
>>> will be displayed also as
>>>  http://ab.123.HGFEDC/kl/mn/op.html,
>>> same as the logical
>>>  http://ab.CDEFGH.123/kl/mn/op.html
>>> in the example.
>>>> If there are bidi
>>>> issues that are important, then the "no leading digit" rule in the
>>>> I-D
>>>> is strengthened.
>>> It's not only leading digits. It's also trailing digits.
>>> Trailing digits don't affect standalone domain names
>>> (or so I think), but domain names often appear in context,
>>> the most frequent of which is an IRI/URI. The issues here
>>> are then very much the same as for domain names only, you
>>> can read about them in the IRI spec (RFC 3987), Section 4.
>>> The approach taken there is the same as for IDNA2003, but
>>> instead of 'label', the term 'component' is used in order
>>> to be more generic. Also, there are no MUSTs, only SHOULDs,
>>> because it's impossible for IRIs to dictate how their
>>> components are formed.
>>> We plan to adapt the bidi section of RFC 3987 once IDNA2008
>>> is more stable.
>>> As a summary, from a bidi viewpoint, digits at both ends
>>> of a TLD label should be prohibited while they still can.
>>> Regards,    Martin.
>>>> Since this isn't strictly on-topic, please send replies off-list.
>>>> Thanks, and sorry for the diversion.
>>>> A
>>>> -- 
>>>> Andrew Sullivan
>>>> ajs at shinkuro.com
>>>> Shinkuro, Inc.
>>>> _______________________________________________
>>>> Idna-update mailing list
>>>> Idna-update at alvestrand.no
>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>>> #-#-#  http://www.sw.it.aoyama.ac.jp        
>>> mailto:duerst at it.aoyama.ac.jp
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

More information about the Idna-update mailing list