draft-liman-tld-names-00.txt and bidi

Patrik Fältström patrik at frobbit.se
Sat Mar 7 17:43:07 CET 2009


On 7 mar 2009, at 16.43, Lyman Chapin wrote:

> At this point, what you say is precisely what the ICANN "applicant  
> guidebook" and the explanatory memorandum that deals with string  
> criteria will say about new gTLDs (assuming that what you meant was  
> "every character in the U-label of a TLD...").

Yes, I have helped a lot in the ICANN process like many others, so I  
am not surprised myself and many who has is saying similar things,  
that indeed are similar to the process ICANN has adopted.

> The only remaining uncertainty is what "IDNA" will mean in the  
> criterion "valid according to IDNA" :-)

Correct.

    Patrik

> - Lyman
>
> On Mar 7, 2009, at 6:39 AM, Patrik Fältström wrote:
>
>> Is it not the case that the safe bet is to say that "every character
>> in the TLD of a U-label must have codepoints that are alphabetic,  
>> have
>> the same directionality, either Left_to_Right or Right_to_Left, and  
>> be
>> valid according to IDNA". And then anything we add to that makes
>> things more and more scary?
>>
>>    Patrik
>>
>> On 7 mar 2009, at 12.02, Vint Cerf wrote:
>>
>>> Martin, et al,
>>>
>>> I would have thought that any notion of all-digit labels would be
>>> hazardous in the event they lead to confusion with dotted IP address
>>> notations and would therefore be forbidden?
>>>
>>> v
>>>
>>>
>>> Vint Cerf
>>> Google
>>> 1818 Library Street, Suite 400
>>> Reston, VA 20190
>>> 202-370-5637
>>> vint at google.com
>>>
>>>
>>>
>>>
>>> On Mar 7, 2009, at 1:43 AM, Martin Duerst wrote:
>>>
>>>> Hello Andrew,
>>>>
>>>> I'm replying to the list because I think there might be enough
>>>> people interested, at least tangentially.
>>>>
>>>> At 07:11 09/03/07, Andrew Sullivan wrote:
>>>>> Dear colleagues,
>>>>>
>>>>> This is slightly off-topic (although related), but I know some
>>>>> experts
>>>>> who have thought about this issue are here so I thought I'd better
>>>>> ask.
>>>>>
>>>>> Over on the DNSOP list, we're discussing draft-liman-tld-
>>>>> names-00.txt.
>>>>> One of the interesting arguments that has cropped up has to do  
>>>>> with
>>>>> leading or ending digits on a label.
>>>>>
>>>>> Now, we have some recommendations (and restrictions) on labels in
>>>>> the
>>>>> bidi document, but of course that is something that restricts  
>>>>> IDNs,
>>>>> and not A-labels.
>>>>>
>>>>> The question that I have is whether there is a similar bidi issue
>>>>> for
>>>>> A-labels (or, more importantly, non-IDN LDH labels: think of a  
>>>>> label
>>>>> "123abc", for instance) in a bidi display or entry context.  I've
>>>>> been
>>>>> assuming "no" because we already have these sorts of labels today
>>>>> and
>>>>> I imagined whatever is happening now would apply.  But it  
>>>>> strikes me
>>>>> that we wouldn't be introducing bidi restrictions if there weren't
>>>>> already a problem.  So is there an issue here that might be  
>>>>> relevant
>>>>> to the I-D in question?
>>>>
>>>> You are right that there is a bidi issue. For some very specific
>>>> example, please see Example 11 at
>>>> http://www.w3.org/International/iri-edit/BidiExamples
>>>> (please read the legends or tooltips carefully).
>>>>
>>>> The reason why there are bidi issues is:
>>>> - Non-IDN labels turn up in IDNs
>>>> - Digits get close to RTL characters, maybe only separated by dots
>>>> - In the bidi algorithm, numbers and dots get associated with  
>>>> nearby
>>>> text and thrown around
>>>>
>>>> Digits between letters of the same directionality get insulated
>>>> from their surroundings, that's why IDNA2003 required RTL letters
>>>> at either end of a label containing RTL characters.
>>>> IDNA2003 did not requre labels with LTR characters to have LTR
>>>> characters at either end, simply because such labels were already
>>>> out there, and because IDNA wasn't in charge of ASCII-only labels.
>>>> However, that doesn't mean that we wouldn't have wanted to prohibit
>>>> them if we had been able to.
>>>>
>>>> For IDNA2008, the situation is slightly different. As far as I
>>>> understand, it doesn't prohibit specific labels, just combinations
>>>> of labels that can cause visual havoc. That means that in some
>>>> situations, a digit at the end of an RTL label may be allowed.
>>>> Turned the other way, it would mean that non-IDN TLDs could be
>>>> created quite freely, but some of them (e.g. those starting with
>>>> a digit) may not allow a second-level RTL label. The details of
>>>> what the restrictions are would have to be calculated using
>>>> Harald's approach. The question of how this is enforced would
>>>> have to be sorted out by people who engage in this kind of thing.
>>>>
>>>>
>>>>> Note that the I-D in question as it stands will not allow all-
>>>>> numeric
>>>>> labels.  But there is a thread of argument that all-numeric labels
>>>>> such as "666" ought to be allowed, on the grounds that such a  
>>>>> label
>>>>> could never be part of an IPv4 address anyway.
>>>>
>>>> As far as my experience with Bidi goes, all-numeric labels
>>>> won't be significantly worse than labels with digits at
>>>> either or both ends. What happens in detail may be slightly
>>>> different, but bad things will happen either way.
>>>> There may even be cases where all-digit labels 'perform'
>>>> better, because the digits will stay together and so there
>>>> will be no "jumping the dot" phenomenon for parts of a label.
>>>> But there may still be visual havoc for the overall order
>>>> of labels, and very important, different domain names may
>>>> still lead to the same visual representation (because of
>>>> the bidi reordering). For that, please see Example 10
>>>> in the above page; a logical
>>>> http://ab.123.CDEFGH/kl/mn/op.html
>>>> will be displayed also as
>>>> http://ab.123.HGFEDC/kl/mn/op.html,
>>>> same as the logical
>>>> http://ab.CDEFGH.123/kl/mn/op.html
>>>> in the example.
>>>>
>>>>> If there are bidi
>>>>> issues that are important, then the "no leading digit" rule in the
>>>>> I-D
>>>>> is strengthened.
>>>>
>>>> It's not only leading digits. It's also trailing digits.
>>>> Trailing digits don't affect standalone domain names
>>>> (or so I think), but domain names often appear in context,
>>>> the most frequent of which is an IRI/URI. The issues here
>>>> are then very much the same as for domain names only, you
>>>> can read about them in the IRI spec (RFC 3987), Section 4.
>>>> The approach taken there is the same as for IDNA2003, but
>>>> instead of 'label', the term 'component' is used in order
>>>> to be more generic. Also, there are no MUSTs, only SHOULDs,
>>>> because it's impossible for IRIs to dictate how their
>>>> components are formed.
>>>>
>>>> We plan to adapt the bidi section of RFC 3987 once IDNA2008
>>>> is more stable.
>>>>
>>>>
>>>> As a summary, from a bidi viewpoint, digits at both ends
>>>> of a TLD label should be prohibited while they still can.
>>>>
>>>> Regards,    Martin.
>>>>
>>>>> Since this isn't strictly on-topic, please send replies off-list.
>>>>> Thanks, and sorry for the diversion.
>>>>>
>>>>> A
>>>>>
>>>>> -- 
>>>>> Andrew Sullivan
>>>>> ajs at shinkuro.com
>>>>> Shinkuro, Inc.
>>>>> _______________________________________________
>>>>> Idna-update mailing list
>>>>> Idna-update at alvestrand.no
>>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>>
>>>>
>>>> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>>>> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp
>>>>
>>>> _______________________________________________
>>>> Idna-update mailing list
>>>> Idna-update at alvestrand.no
>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20090307/bab33433/attachment.pgp 


More information about the Idna-update mailing list