Comments on IDNA Bidi

Sat Jan 12 03:45:54 CET 2008

> > However, this should really not be proposed as something that users of
> > IDNA should do. Instead, it should be used to test that Michel's
> > formulation is correct.
> Exactly - I want to test the algorithm before proposing one. However, I
> don't understand what you wrote above:
>
> - if taken as written, it would test the string "A1" by embedding it
> between the strings "ALEPH BET" and "GIMEL DAV", which certainly would
> cause the test to fail (the "1" would pick up its directionality from
> the surrounding RTL characters, and the whole thing would likely display
> in the order of "1 DAV GIMEL A BET ALEPH" - I don't have my direction
> calculator with me). So I'm assuming you're thinking of some separators
> - which ones?

Ken offered some comments. I was probably not very clear. The purpose is so
that if we have

abc.def.ghi

that we don't get an ordering like

abcd.fe.ghi

that is, where characters hop across label boundaries. (lowercase above
doesn't mean just English). It is ok to switch the order of fields, or the
order within labels, but each field needs to stay intact.

Now, because the BIDI algorithm has limited scope, we don't need to test all
characters, just certain combinations. So the idea is that if we test

XY.abc.ZW

for all combinations of X, Y, Z, W (where it makes a difference, so only
from a few BIDI categories), we can see whether there is any "hopping". Here
"." is a standin, because other characters can delimit labels, like /, ?, #,
...

Is that a bit clearer?

It's certainly a bidi issue too; as you know, one of the driving forces
> for the clarification here is the problem of Yiddish written in the
> Hebrew script. But now that this text is safely embedded in "issues",
> and the decision is made to link this document to "issues", the need for
> this text here is much lessened.

It's not a BIDI issue, meaning an issue caused by text directionality. It is
an issue that happens in a script that is bidi, but is unconnected with
directionality. That's what I meant.

I personally think that recommending a non-standard display is a
> non-starter. We probably need to reformulate this paragraph as "the
> result of the Unicode BIDI algorithm is LTRtsriF.LTRdnoceS.LTR, people
> may be surprised by that, but we can't fix it". I'll have to test that
> this is true in all cases before saying it in the document, though...

good.

>
> >
> > Bidi-5.
> >    One particular example of the last case is if a program chooses to
> >    examine the last character (in network order) of a string in order to
> >    determine its directionality, rather than its first; if it finds an
> >
> >    NSM character and tries to display the string as if it was a left-to-
> >    right string, the resulting display may be interesting, but not
> >    useful.
> >
> > I don't understand this paragraph. When and why would this happen with
> > IDNA-conformant programs?
> >
> I think the text is clear enough - if you get a label "ALEF BET <some
> NSM character>", an IDNA2003 program can look at the last character in
> the string and say "this is not a RTL string", and treat it as if it was
> LTR. In IDNA2003, that will be a safe assumption. In IDNAx, it will not
> be a safe assumption.

I find that a bit odd. The case you are taking is

A program is looking at an IDNAbis URL, and thinks that it is a valid
IDNA2003 URL, and makes some assumptions about it, and things break.

This case that you mention is just a tip of a iceberg. There are a *very*
large number of assumptions that a program can make about IDNA2003 that will
completely break under IDNAbis (as currently drafted). Many, many things
would break, not just this, and not just this in BIDI. So I don't see why
you are just calling out this one.

>
> Suggestions for a clearer way to state it?
>
>                         Harald
>
>

-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080111/00a23c90/attachment.html