Comments on IDNA Bidi
harald at alvestrand.no
Thu Jan 17 07:20:59 CET 2008
Michel Suignard skrev:
>> On Behalf Of Harald Alvestrand
>> Michel, you're backing down on the criterion you yourself argued
>> strongly for a few messages back. Please decide one way or another.
>> - EITHER labels breaking apart on display under certain conditions
>> is unacceptable
>> - OR labels breaking apart on display under certain conditions is
> Harald, I think I am with you on reaching for the first way. Breaking labels on display is always a bad idea. However avoiding it completely was not a stated goal of either IDN2003 or IRI (afaik). So on that aspect we don't have a clean precedent. And obviously if you don't have to display (some cases in IRN), we don't really care.
Thanks for coming back to this position.
Two points to note:
- Once a name is allowed for use as a DNS "hostname", it is impossible
for any DNS rule to constrain whether it's used in an IRI, an IRN, a
bare domain name or an email address. There's precedent for saying that
certain names are allowed in the DNS but not as hostnames (those that
start with _ are a prime example), but there's no precedent for
distinguishing the classes of hostname.
- We don't have a clean precedent - it's impossible at this time to
determine whether the breakage in IDN2003 was deliberately accepted
damage or an oversight. That's a situation I don't want to be in again,
which is why I've been hammering and hammering on "what is the rule".
> Coming back strictly to IDN, we should achieve it by a mix of repertoire restrictions and additional rules. Your example using CS EN AN CS R in LTR context convinced me that we should probably look into new rules for EN and AN types. The issues with ET and ON types should probably be dealt with by eliminating most of them by repertoire restrictions.
> Where I may differ from you is that I am not completely convinced we can achieve 100% success in avoiding labels breaking apart in the most extreme pathological cases. But you have convinced me that it is worth a look at improving the rules further for some bidi types such as EN and AN.
For the pathological cases, I hope that we can list them and say "we do
not give any guarantees in those". This may (as Mark has implicitly
said) involve saying that you can't expect to display a percent-escaped
hostname correctly (percent is an ET, and percent-escaping allows you to
put it *anywhere*). We've already agreed that this is the case where
RLE/PDF is near the domain name.
More information about the Idna-update