I-D Action:draft-alvestrand-idna-bidi-04.txt (fwd)
Harald Tveit Alvestrand
harald at alvestrand.no
Sun Feb 17 06:33:40 CET 2008
--On 15. februar 2008 15:35 -0500 John C Klensin <klensin at jck.com> wrote:
>> Yes, it does. I think the defensive test (and the one that is
>> simplest to code) for a resolver would be to flag anything
>> where the whole domain name contains a R/AL/AN and where any
>> label violates the bidi rule as "possibly confusing".
>>
>> Is it OK to say that one should refuse to look up any such
>> name?
>
> Given especially the DNAME-related cases, I think not. There
> are too many legitimate names that can be suspicious ("possibly
> confusing") under this sort of rule.
Let's gnaw on this bone a bit...... the resolver can't know whether the
name being looked up refer to a DNAME or not; the resolver has to make a
decision based purely on the string it's presented with. The nice thing is
that it's actually presented with the whole domain name at once, so doesn't
have to worry about "what can possibly be added to this string" as a
registry has to.
let's use RTLBAD as a stand-in for a RTL label that fails the test, RTLGOOD
as one that passes.
For ease of discussion, let's use "abc" for an LTR label that passes the
test, and "9bc" for a LTR label that starts with a number (there are others
that will fail, such as "-foo-" - but the numeric one is the most
frequently encountered, I think).
"Reject" means "Refuse to look up the name"; "Accept" means "Try to look up
the name".
- RTLBAD.RTLGOOD -> Reject
- RTLBAD.abc -> Reject
- RTLGOOD.RTLGOOD -> Accept
- RTLGOOD.abc -> Accept
- abc.abc -> Accept (no RTL, so no bidi)
- 9bc.abc -> Accept
- abc.9bc -> Accept
I think these are uncontroversial. (Check: All agree?)
- RTLGOOD.9bc -> this can make RTLGOOD display inconsistently, if it ends
with a number. Accept or refuse?
- RTLGOOD.abc.9bc -> this will display correctly, because "abc" contains
strong LTR characters. Accept or refuse?
- 9bc.RTLGOOD.abc -> This will probably (?) display correcly, but is
outside the range of our current tests. Accept or refuse?
Getting the exact list of label pairs/triplets that don't cause trouble is
complex, and the resulting rules for them are likely to be complex too. So
far, we've emphasized relatively simple rules.
We could write tests for this, and see what LDH labels can be allowed next
to RTL labels. Or we could say "a plague on all their houses" and refuse.
Thoughts?
Harald
More information about the Idna-update
mailing list