Comments on bidi-04
    John C Klensin 
    klensin at jck.com
       
    Tue Mar 18 17:49:05 CET 2008
    
    
  
--On Tuesday, 11 March, 2008 08:57 -0400 Harald Tveit Alvestrand
<harald at alvestrand.no> wrote:
> For instance, the string AN EN, when embedded between 2 dots,
> will behave as follows in a LTR context:
> 
> <RLE><PDF>.<AN><EN>.<RLE><PDF> reorders into <AN><EN>.. (that
> is, both dots jump to the right end of the string). (I hope
> you agree with me that "sor" and "eor" will both be "R" for
> the run that goes .<AN><EN>., while the embedding direction is
> L).
> 
> This affects 3 of the 53 length-2 combinations allowed by
> IDNA2003, and 43 of the 375 length-3 combinations allowed by
> IDNA2003.
> 
> But - ALL of the strings affected turn out to be eliminated by
> our currently proposed set of selection criteria. If a string
> passes the test for "safe BIDI label" in bidi-04, it will not
> be affected by bidi formatting codes in the text around it.
> 
> So we can eliminate the restriction.
Well, some of us still believe that the "safe BIDI label" test
is too safe.    Certainly it is sufficient, but I'm not sure
that the restrictions it implies --especially the trailing digit
restriction-- is going to be acceptable in actual use.  I am
fairly certain that I understand the problem.  I am even
concerned that the "string of digits is treated as a single
numeral" assumption may not be as natural in label contexts as
we might think. 
However, I don't think "outside the protocol, therefore someone
else's problem" is workable in this case.  I don't believe that
"the user has to figure out typing order if a domain name (or
IRI) appears on the side of a bus and we don't guarantee that
the string will be unambiguous" is workable either.   If I
correctly understand that proposal, it is equivalent to
permitting ambiguous names on the wire -- far worse than the
fuzziness about mapping that has concerned some others.  It
seems to me that, if we give up on the idea that each printed
domain name has a single, unambiguous, interpretation, that we
have given away the store... and that preserving that particular
bit of DNS integrity is that paramount criterion for a
successful IDN approach.
If I've understood where we are, we have three possibilities on
the table:
	* The "safe BIDI domain name" (not just label) rules of
	draft-alvestrand...bidi-04.
	
	* The "no guarantee of unambiguous ordering" proposal
	that was discussed quite a bit last week but that has
	not been clearly written down.
	
	* Staying with the IDNA2003 bidi model, which we have
	mostly, if not all, agreed is broken.
Perhaps it is time to see if we can get out of the box.   
For example, are the issues with using labels that contain right
to left characters -- without the normal constraints that come
from knowing that the label strings must be somewhat homogeneous
and contextually plausible in the relevant language --
ultimately messy enough that we need to insist that any label
that is to be interpreted RtoL must begin with an explicit
switch into that mode?  While I dislike the ideas of introducing
that sort of constraint or of doing state-switching on labels,
perhaps we have learned enough about stateful strings with
2022-JP and similar character set designs to make it work.   And
the other alternatives -- whether in the form of draconian
restrictions about what can occur in such labels or in
introducing ambiguities of interpretation-- are beginning to
look much worse.
   john
    
    
More information about the Idna-update
mailing list