draft-liman-tld-names-00.txt and bidi

Vint Cerf vint at google.com
Sat Mar 7 12:02:20 CET 2009

Martin, et al,

I would have thought that any notion of all-digit labels would be  
hazardous in the event they lead to confusion with dotted IP address  
notations and would therefore be forbidden?


Vint Cerf
1818 Library Street, Suite 400
Reston, VA 20190
vint at google.com

On Mar 7, 2009, at 1:43 AM, Martin Duerst wrote:

> Hello Andrew,
> I'm replying to the list because I think there might be enough
> people interested, at least tangentially.
> At 07:11 09/03/07, Andrew Sullivan wrote:
>> Dear colleagues,
>> This is slightly off-topic (although related), but I know some  
>> experts
>> who have thought about this issue are here so I thought I'd better
>> ask.
>> Over on the DNSOP list, we're discussing draft-liman-tld- 
>> names-00.txt.
>> One of the interesting arguments that has cropped up has to do with
>> leading or ending digits on a label.
>> Now, we have some recommendations (and restrictions) on labels in the
>> bidi document, but of course that is something that restricts IDNs,
>> and not A-labels.
>> The question that I have is whether there is a similar bidi issue for
>> A-labels (or, more importantly, non-IDN LDH labels: think of a label
>> "123abc", for instance) in a bidi display or entry context.  I've  
>> been
>> assuming "no" because we already have these sorts of labels today and
>> I imagined whatever is happening now would apply.  But it strikes me
>> that we wouldn't be introducing bidi restrictions if there weren't
>> already a problem.  So is there an issue here that might be relevant
>> to the I-D in question?
> You are right that there is a bidi issue. For some very specific
> example, please see Example 11 at
> http://www.w3.org/International/iri-edit/BidiExamples
> (please read the legends or tooltips carefully).
> The reason why there are bidi issues is:
> - Non-IDN labels turn up in IDNs
> - Digits get close to RTL characters, maybe only separated by dots
> - In the bidi algorithm, numbers and dots get associated with nearby
>  text and thrown around
> Digits between letters of the same directionality get insulated
> from their surroundings, that's why IDNA2003 required RTL letters
> at either end of a label containing RTL characters.
> IDNA2003 did not requre labels with LTR characters to have LTR
> characters at either end, simply because such labels were already
> out there, and because IDNA wasn't in charge of ASCII-only labels.
> However, that doesn't mean that we wouldn't have wanted to prohibit
> them if we had been able to.
> For IDNA2008, the situation is slightly different. As far as I
> understand, it doesn't prohibit specific labels, just combinations
> of labels that can cause visual havoc. That means that in some
> situations, a digit at the end of an RTL label may be allowed.
> Turned the other way, it would mean that non-IDN TLDs could be
> created quite freely, but some of them (e.g. those starting with
> a digit) may not allow a second-level RTL label. The details of
> what the restrictions are would have to be calculated using
> Harald's approach. The question of how this is enforced would
> have to be sorted out by people who engage in this kind of thing.
>> Note that the I-D in question as it stands will not allow all-numeric
>> labels.  But there is a thread of argument that all-numeric labels
>> such as "666" ought to be allowed, on the grounds that such a label
>> could never be part of an IPv4 address anyway.
> As far as my experience with Bidi goes, all-numeric labels
> won't be significantly worse than labels with digits at
> either or both ends. What happens in detail may be slightly
> different, but bad things will happen either way.
> There may even be cases where all-digit labels 'perform'
> better, because the digits will stay together and so there
> will be no "jumping the dot" phenomenon for parts of a label.
> But there may still be visual havoc for the overall order
> of labels, and very important, different domain names may
> still lead to the same visual representation (because of
> the bidi reordering). For that, please see Example 10
> in the above page; a logical
>   http://ab.123.CDEFGH/kl/mn/op.html
> will be displayed also as
>   http://ab.123.HGFEDC/kl/mn/op.html,
> same as the logical
>   http://ab.CDEFGH.123/kl/mn/op.html
> in the example.
>> If there are bidi
>> issues that are important, then the "no leading digit" rule in the  
>> I-D
>> is strengthened.
> It's not only leading digits. It's also trailing digits.
> Trailing digits don't affect standalone domain names
> (or so I think), but domain names often appear in context,
> the most frequent of which is an IRI/URI. The issues here
> are then very much the same as for domain names only, you
> can read about them in the IRI spec (RFC 3987), Section 4.
> The approach taken there is the same as for IDNA2003, but
> instead of 'label', the term 'component' is used in order
> to be more generic. Also, there are no MUSTs, only SHOULDs,
> because it's impossible for IRIs to dictate how their
> components are formed.
> We plan to adapt the bidi section of RFC 3987 once IDNA2008
> is more stable.
> As a summary, from a bidi viewpoint, digits at both ends
> of a TLD label should be prohibited while they still can.
> Regards,    Martin.
>> Since this isn't strictly on-topic, please send replies off-list.
>> Thanks, and sorry for the diversion.
>> A
>> -- 
>> Andrew Sullivan
>> ajs at shinkuro.com
>> Shinkuro, Inc.
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

More information about the Idna-update mailing list