draft-liman-tld-names-00.txt and bidi

Martin Duerst duerst at it.aoyama.ac.jp
Sat Mar 7 07:43:02 CET 2009


Hello Andrew,

I'm replying to the list because I think there might be enough
people interested, at least tangentially.

At 07:11 09/03/07, Andrew Sullivan wrote:
>Dear colleagues,
>
>This is slightly off-topic (although related), but I know some experts
>who have thought about this issue are here so I thought I'd better
>ask.
>
>Over on the DNSOP list, we're discussing draft-liman-tld-names-00.txt.
>One of the interesting arguments that has cropped up has to do with
>leading or ending digits on a label.
>
>Now, we have some recommendations (and restrictions) on labels in the
>bidi document, but of course that is something that restricts IDNs,
>and not A-labels.
>
>The question that I have is whether there is a similar bidi issue for
>A-labels (or, more importantly, non-IDN LDH labels: think of a label
>"123abc", for instance) in a bidi display or entry context.  I've been
>assuming "no" because we already have these sorts of labels today and
>I imagined whatever is happening now would apply.  But it strikes me
>that we wouldn't be introducing bidi restrictions if there weren't
>already a problem.  So is there an issue here that might be relevant
>to the I-D in question? 

You are right that there is a bidi issue. For some very specific
example, please see Example 11 at
http://www.w3.org/International/iri-edit/BidiExamples
(please read the legends or tooltips carefully).

The reason why there are bidi issues is:
- Non-IDN labels turn up in IDNs
- Digits get close to RTL characters, maybe only separated by dots
- In the bidi algorithm, numbers and dots get associated with nearby
  text and thrown around

Digits between letters of the same directionality get insulated
from their surroundings, that's why IDNA2003 required RTL letters
at either end of a label containing RTL characters.
IDNA2003 did not requre labels with LTR characters to have LTR
characters at either end, simply because such labels were already
out there, and because IDNA wasn't in charge of ASCII-only labels.
However, that doesn't mean that we wouldn't have wanted to prohibit
them if we had been able to.

For IDNA2008, the situation is slightly different. As far as I
understand, it doesn't prohibit specific labels, just combinations
of labels that can cause visual havoc. That means that in some
situations, a digit at the end of an RTL label may be allowed.
Turned the other way, it would mean that non-IDN TLDs could be
created quite freely, but some of them (e.g. those starting with
a digit) may not allow a second-level RTL label. The details of
what the restrictions are would have to be calculated using
Harald's approach. The question of how this is enforced would
have to be sorted out by people who engage in this kind of thing.


>Note that the I-D in question as it stands will not allow all-numeric
>labels.  But there is a thread of argument that all-numeric labels
>such as "666" ought to be allowed, on the grounds that such a label
>could never be part of an IPv4 address anyway.

As far as my experience with Bidi goes, all-numeric labels
won't be significantly worse than labels with digits at
either or both ends. What happens in detail may be slightly
different, but bad things will happen either way.
There may even be cases where all-digit labels 'perform'
better, because the digits will stay together and so there
will be no "jumping the dot" phenomenon for parts of a label.
But there may still be visual havoc for the overall order
of labels, and very important, different domain names may
still lead to the same visual representation (because of
the bidi reordering). For that, please see Example 10
in the above page; a logical
   http://ab.123.CDEFGH/kl/mn/op.html
will be displayed also as
   http://ab.123.HGFEDC/kl/mn/op.html,
same as the logical
   http://ab.CDEFGH.123/kl/mn/op.html
in the example.

>If there are bidi
>issues that are important, then the "no leading digit" rule in the I-D
>is strengthened.

It's not only leading digits. It's also trailing digits.
Trailing digits don't affect standalone domain names
(or so I think), but domain names often appear in context,
the most frequent of which is an IRI/URI. The issues here
are then very much the same as for domain names only, you
can read about them in the IRI spec (RFC 3987), Section 4.
The approach taken there is the same as for IDNA2003, but
instead of 'label', the term 'component' is used in order
to be more generic. Also, there are no MUSTs, only SHOULDs,
because it's impossible for IRIs to dictate how their
components are formed.

We plan to adapt the bidi section of RFC 3987 once IDNA2008
is more stable.


As a summary, from a bidi viewpoint, digits at both ends
of a TLD label should be prohibited while they still can.

Regards,    Martin.

>Since this isn't strictly on-topic, please send replies off-list.  
>Thanks, and sorry for the diversion.  
>
>A
>
>-- 
>Andrew Sullivan
>ajs at shinkuro.com
>Shinkuro, Inc.
>_______________________________________________
>Idna-update mailing list
>Idna-update at alvestrand.no
>http://www.alvestrand.no/mailman/listinfo/idna-update


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list