draft-liman-tld-names-00.txt and bidi
patrik at frobbit.se
Sat Mar 7 12:39:50 CET 2009
Is it not the case that the safe bet is to say that "every character
in the TLD of a U-label must have codepoints that are alphabetic, have
the same directionality, either Left_to_Right or Right_to_Left, and be
valid according to IDNA". And then anything we add to that makes
things more and more scary?
On 7 mar 2009, at 12.02, Vint Cerf wrote:
> Martin, et al,
> I would have thought that any notion of all-digit labels would be
> hazardous in the event they lead to confusion with dotted IP address
> notations and would therefore be forbidden?
> Vint Cerf
> 1818 Library Street, Suite 400
> Reston, VA 20190
> vint at google.com
> On Mar 7, 2009, at 1:43 AM, Martin Duerst wrote:
>> Hello Andrew,
>> I'm replying to the list because I think there might be enough
>> people interested, at least tangentially.
>> At 07:11 09/03/07, Andrew Sullivan wrote:
>>> Dear colleagues,
>>> This is slightly off-topic (although related), but I know some
>>> who have thought about this issue are here so I thought I'd better
>>> Over on the DNSOP list, we're discussing draft-liman-tld-
>>> One of the interesting arguments that has cropped up has to do with
>>> leading or ending digits on a label.
>>> Now, we have some recommendations (and restrictions) on labels in
>>> bidi document, but of course that is something that restricts IDNs,
>>> and not A-labels.
>>> The question that I have is whether there is a similar bidi issue
>>> A-labels (or, more importantly, non-IDN LDH labels: think of a label
>>> "123abc", for instance) in a bidi display or entry context. I've
>>> assuming "no" because we already have these sorts of labels today
>>> I imagined whatever is happening now would apply. But it strikes me
>>> that we wouldn't be introducing bidi restrictions if there weren't
>>> already a problem. So is there an issue here that might be relevant
>>> to the I-D in question?
>> You are right that there is a bidi issue. For some very specific
>> example, please see Example 11 at
>> (please read the legends or tooltips carefully).
>> The reason why there are bidi issues is:
>> - Non-IDN labels turn up in IDNs
>> - Digits get close to RTL characters, maybe only separated by dots
>> - In the bidi algorithm, numbers and dots get associated with nearby
>> text and thrown around
>> Digits between letters of the same directionality get insulated
>> from their surroundings, that's why IDNA2003 required RTL letters
>> at either end of a label containing RTL characters.
>> IDNA2003 did not requre labels with LTR characters to have LTR
>> characters at either end, simply because such labels were already
>> out there, and because IDNA wasn't in charge of ASCII-only labels.
>> However, that doesn't mean that we wouldn't have wanted to prohibit
>> them if we had been able to.
>> For IDNA2008, the situation is slightly different. As far as I
>> understand, it doesn't prohibit specific labels, just combinations
>> of labels that can cause visual havoc. That means that in some
>> situations, a digit at the end of an RTL label may be allowed.
>> Turned the other way, it would mean that non-IDN TLDs could be
>> created quite freely, but some of them (e.g. those starting with
>> a digit) may not allow a second-level RTL label. The details of
>> what the restrictions are would have to be calculated using
>> Harald's approach. The question of how this is enforced would
>> have to be sorted out by people who engage in this kind of thing.
>>> Note that the I-D in question as it stands will not allow all-
>>> labels. But there is a thread of argument that all-numeric labels
>>> such as "666" ought to be allowed, on the grounds that such a label
>>> could never be part of an IPv4 address anyway.
>> As far as my experience with Bidi goes, all-numeric labels
>> won't be significantly worse than labels with digits at
>> either or both ends. What happens in detail may be slightly
>> different, but bad things will happen either way.
>> There may even be cases where all-digit labels 'perform'
>> better, because the digits will stay together and so there
>> will be no "jumping the dot" phenomenon for parts of a label.
>> But there may still be visual havoc for the overall order
>> of labels, and very important, different domain names may
>> still lead to the same visual representation (because of
>> the bidi reordering). For that, please see Example 10
>> in the above page; a logical
>> will be displayed also as
>> same as the logical
>> in the example.
>>> If there are bidi
>>> issues that are important, then the "no leading digit" rule in the
>>> is strengthened.
>> It's not only leading digits. It's also trailing digits.
>> Trailing digits don't affect standalone domain names
>> (or so I think), but domain names often appear in context,
>> the most frequent of which is an IRI/URI. The issues here
>> are then very much the same as for domain names only, you
>> can read about them in the IRI spec (RFC 3987), Section 4.
>> The approach taken there is the same as for IDNA2003, but
>> instead of 'label', the term 'component' is used in order
>> to be more generic. Also, there are no MUSTs, only SHOULDs,
>> because it's impossible for IRIs to dictate how their
>> components are formed.
>> We plan to adapt the bidi section of RFC 3987 once IDNA2008
>> is more stable.
>> As a summary, from a bidi viewpoint, digits at both ends
>> of a TLD label should be prohibited while they still can.
>> Regards, Martin.
>>> Since this isn't strictly on-topic, please send replies off-list.
>>> Thanks, and sorry for the diversion.
>>> Andrew Sullivan
>>> ajs at shinkuro.com
>>> Shinkuro, Inc.
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>> #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>> #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
>> Idna-update mailing list
>> Idna-update at alvestrand.no
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update