To be published: draft-alvestrand-idna-bidi-00.txt

Mon Oct 16 05:43:08 CEST 2006

--On Monday, 16 October, 2006 01:44 +0200 Harald Alvestrand
<harald at alvestrand.no> wrote:

> Paul Hoffman wrote:
>> At 11:35 PM +0200 10/15/06, Harald Alvestrand wrote:
>>> 
>> 
>> Making every receiver correctly add all of the steps of
>> section 3.3 of  UAX 9 is onerous and error-prone. A much
>> simpler change would be to  simply say that a character of
>> type NSM is considered to have the  directionality of the
>> base character which it follows.
>> 
>> This will fix both the problems listed in this draft, as well
>> as any  related problem where a combining character is
>> following a RandALCat  character.
>> 
> My biggest worry is that we'll be right back here in a year if
> we discover that someone really needs EN, ES, ET, AN, CS, BN,
> B, S, WS or ON characters in conjunction with RTL strings....
> I have not yet figured out how I can be sure what the result
> is for all of these classes in a way that gives me reassurance
> that we will never, ever, ever need to allow them.
> 
> My second-biggest worry is that one application will use his
> UAX9-compliant library to display the string, another will use
> a "tuned" algorithm that depends on the Stringprep rules, and
> they will display different results.
> 
> I'm all for simplifying/subsetting UAX9 if we can prove to
> ourselves that the simplification/subsetting is equivalent
> under the restricted set of cases we consider - but I'd like
> to have fairly rigorous argument that this is the case before
> jumping.

Harald, independent of whether or not that is provable, it seems
to me that one could buy considerable protection with some
guidance about what is tested.   As we have discussed offline,
the display issue Paul raises occurs in its most obvious form if
someone performs ToUnicode on a punycode string and then tests
the last (in network order) character.  If that character is not
strictly Right-Left, Stringprep2003 justifies the assumption
that the whole string is Left to right.  If, by contrast, we had
advised (or required) that directionality tests for display
purposes be performed on the first character, with testing for
adequate homogenity being performed only in ToASCII --and
perhaps in ToASCII at registration time, rather than lookup
time, we'd be looking at something of a different problem.  And
it seems to me that suggests a change we should be making now.

   john