I-D Action:draft-alvestrand-idna-bidi-04.txt (fwd)

Mon Feb 18 18:56:01 CET 2008

As far as confusing or dangerous display is concerned, we have a
spectrum of issues. At one extreme, we have some very simple rules to
prohibit the most egregious cases, such as U+2044 and U+2215 (which
look like the slash '/'). At the other extreme, we may have some
potentially very complicated rules in resolvers for allowing the
display of a mixture of certain scripts under some circumstances (e.g.
Latin and Cyrillic, as long as there is a hyphen between the two,
etc).

IDNA200X leaves it up to the resolver to warn about such confusing
names, but the resolver MUST look up the name if it is asked to do so.

So, IDNA200X has to draw the line somewhere between "very simple
rules" and "potentially very complicated rules". My gut feeling is
that the current IDNA200X bidi rules are already complicated enough.
We probably don't want to make them more complicated.

A resolver implementation is free to have more complicated rules about
when to "warn" about confusing bidi displays, but it MUST look up the
name if it is asked to do so.

As we gain more experience with registries and resolvers, we may have
to tighten up or loosen the rules as we proceed along the Standards
Track from Proposed to Draft and then to Standard.

Erik

On Feb 16, 2008 9:33 PM, Harald Tveit Alvestrand <harald at alvestrand.no> wrote:
>
>
> --On 15. februar 2008 15:35 -0500 John C Klensin <klensin at jck.com> wrote:
>
> >> Yes, it does. I think the defensive test (and the one that is
> >> simplest to code) for a resolver would be to flag anything
> >> where the whole domain name contains a R/AL/AN and where any
> >> label violates the bidi rule as "possibly confusing".
> >>
> >> Is it OK to say that one should refuse to look up any such
> >> name?
> >
> > Given especially the DNAME-related cases, I think not.  There
> > are too many legitimate names that can be suspicious ("possibly
> > confusing") under this sort of rule.
>
> Let's gnaw on this bone a bit...... the resolver can't know whether the
> name being looked up refer to a DNAME or not; the resolver has to make a
> decision based purely on the string it's presented with. The nice thing is
> that it's actually presented with the whole domain name at once, so doesn't
> have to worry about "what can possibly be added to this string" as a
> registry has to.
>
> let's use RTLBAD as a stand-in for a RTL label that fails the test, RTLGOOD
> as one that passes.
> For ease of discussion, let's use "abc" for an LTR label that passes the
> test, and "9bc" for a LTR label that starts with a number (there are others
> that will fail, such as "-foo-" - but the numeric one is the most
> frequently encountered, I think).
>
> "Reject" means "Refuse to look up the name"; "Accept" means "Try to look up
> the name".
>
> - RTLBAD.RTLGOOD -> Reject
> - RTLBAD.abc -> Reject
>
> - RTLGOOD.RTLGOOD -> Accept
> - RTLGOOD.abc -> Accept
> - abc.abc -> Accept (no RTL, so no bidi)
> - 9bc.abc -> Accept
> - abc.9bc -> Accept
>
> I think these are uncontroversial. (Check: All agree?)
>
> - RTLGOOD.9bc -> this can make RTLGOOD display inconsistently, if it ends
> with a number. Accept or refuse?
> - RTLGOOD.abc.9bc -> this will display correctly, because "abc" contains
> strong LTR characters. Accept or refuse?
> - 9bc.RTLGOOD.abc -> This will probably (?) display correcly, but is
> outside the range of our current tests. Accept or refuse?
>
> Getting the exact list of label pairs/triplets that don't cause trouble is
> complex, and the resulting rules for them are likely to be complex too. So
> far, we've emphasized relatively simple rules.
>
> We could write tests for this, and see what LDH labels can be allowed next
> to RTL labels. Or we could say "a plague on all their houses" and refuse.
>
> Thoughts?
>
>                      Harald
>
>
>
>
>
>
>