draft-liman-tld-names-00.txt and bidi

Patrik Fältström patrik at frobbit.se
Sat Mar 7 12:39:50 CET 2009


Is it not the case that the safe bet is to say that "every character  
in the TLD of a U-label must have codepoints that are alphabetic, have  
the same directionality, either Left_to_Right or Right_to_Left, and be  
valid according to IDNA". And then anything we add to that makes  
things more and more scary?

    Patrik

On 7 mar 2009, at 12.02, Vint Cerf wrote:

> Martin, et al,
>
> I would have thought that any notion of all-digit labels would be
> hazardous in the event they lead to confusion with dotted IP address
> notations and would therefore be forbidden?
>
> v
>
>
> Vint Cerf
> Google
> 1818 Library Street, Suite 400
> Reston, VA 20190
> 202-370-5637
> vint at google.com
>
>
>
>
> On Mar 7, 2009, at 1:43 AM, Martin Duerst wrote:
>
>> Hello Andrew,
>>
>> I'm replying to the list because I think there might be enough
>> people interested, at least tangentially.
>>
>> At 07:11 09/03/07, Andrew Sullivan wrote:
>>> Dear colleagues,
>>>
>>> This is slightly off-topic (although related), but I know some
>>> experts
>>> who have thought about this issue are here so I thought I'd better
>>> ask.
>>>
>>> Over on the DNSOP list, we're discussing draft-liman-tld-
>>> names-00.txt.
>>> One of the interesting arguments that has cropped up has to do with
>>> leading or ending digits on a label.
>>>
>>> Now, we have some recommendations (and restrictions) on labels in  
>>> the
>>> bidi document, but of course that is something that restricts IDNs,
>>> and not A-labels.
>>>
>>> The question that I have is whether there is a similar bidi issue  
>>> for
>>> A-labels (or, more importantly, non-IDN LDH labels: think of a label
>>> "123abc", for instance) in a bidi display or entry context.  I've
>>> been
>>> assuming "no" because we already have these sorts of labels today  
>>> and
>>> I imagined whatever is happening now would apply.  But it strikes me
>>> that we wouldn't be introducing bidi restrictions if there weren't
>>> already a problem.  So is there an issue here that might be relevant
>>> to the I-D in question?
>>
>> You are right that there is a bidi issue. For some very specific
>> example, please see Example 11 at
>> http://www.w3.org/International/iri-edit/BidiExamples
>> (please read the legends or tooltips carefully).
>>
>> The reason why there are bidi issues is:
>> - Non-IDN labels turn up in IDNs
>> - Digits get close to RTL characters, maybe only separated by dots
>> - In the bidi algorithm, numbers and dots get associated with nearby
>> text and thrown around
>>
>> Digits between letters of the same directionality get insulated
>> from their surroundings, that's why IDNA2003 required RTL letters
>> at either end of a label containing RTL characters.
>> IDNA2003 did not requre labels with LTR characters to have LTR
>> characters at either end, simply because such labels were already
>> out there, and because IDNA wasn't in charge of ASCII-only labels.
>> However, that doesn't mean that we wouldn't have wanted to prohibit
>> them if we had been able to.
>>
>> For IDNA2008, the situation is slightly different. As far as I
>> understand, it doesn't prohibit specific labels, just combinations
>> of labels that can cause visual havoc. That means that in some
>> situations, a digit at the end of an RTL label may be allowed.
>> Turned the other way, it would mean that non-IDN TLDs could be
>> created quite freely, but some of them (e.g. those starting with
>> a digit) may not allow a second-level RTL label. The details of
>> what the restrictions are would have to be calculated using
>> Harald's approach. The question of how this is enforced would
>> have to be sorted out by people who engage in this kind of thing.
>>
>>
>>> Note that the I-D in question as it stands will not allow all- 
>>> numeric
>>> labels.  But there is a thread of argument that all-numeric labels
>>> such as "666" ought to be allowed, on the grounds that such a label
>>> could never be part of an IPv4 address anyway.
>>
>> As far as my experience with Bidi goes, all-numeric labels
>> won't be significantly worse than labels with digits at
>> either or both ends. What happens in detail may be slightly
>> different, but bad things will happen either way.
>> There may even be cases where all-digit labels 'perform'
>> better, because the digits will stay together and so there
>> will be no "jumping the dot" phenomenon for parts of a label.
>> But there may still be visual havoc for the overall order
>> of labels, and very important, different domain names may
>> still lead to the same visual representation (because of
>> the bidi reordering). For that, please see Example 10
>> in the above page; a logical
>>  http://ab.123.CDEFGH/kl/mn/op.html
>> will be displayed also as
>>  http://ab.123.HGFEDC/kl/mn/op.html,
>> same as the logical
>>  http://ab.CDEFGH.123/kl/mn/op.html
>> in the example.
>>
>>> If there are bidi
>>> issues that are important, then the "no leading digit" rule in the
>>> I-D
>>> is strengthened.
>>
>> It's not only leading digits. It's also trailing digits.
>> Trailing digits don't affect standalone domain names
>> (or so I think), but domain names often appear in context,
>> the most frequent of which is an IRI/URI. The issues here
>> are then very much the same as for domain names only, you
>> can read about them in the IRI spec (RFC 3987), Section 4.
>> The approach taken there is the same as for IDNA2003, but
>> instead of 'label', the term 'component' is used in order
>> to be more generic. Also, there are no MUSTs, only SHOULDs,
>> because it's impossible for IRIs to dictate how their
>> components are formed.
>>
>> We plan to adapt the bidi section of RFC 3987 once IDNA2008
>> is more stable.
>>
>>
>> As a summary, from a bidi viewpoint, digits at both ends
>> of a TLD label should be prohibited while they still can.
>>
>> Regards,    Martin.
>>
>>> Since this isn't strictly on-topic, please send replies off-list.
>>> Thanks, and sorry for the diversion.
>>>
>>> A
>>>
>>> -- 
>>> Andrew Sullivan
>>> ajs at shinkuro.com
>>> Shinkuro, Inc.
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
>> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>



More information about the Idna-update mailing list