Remider: BIDI inter-label tests in -02

Tue Sep 9 02:57:41 CEST 2008

Hello Andrew,

First of all, I apologize for being unfamiliar with DNS. However, I do
believe that you have taken this discussion in the right direction, by
coining the term pre-resolution and by suggesting possible
interpretations of the current draft with respect to DNAME and other
features.

On Mon, Sep 8, 2008 at 2:18 PM, Andrew Sullivan <ajs at commandprompt.com> wrote:
> On Fri, Sep 05, 2008 at 07:01:32PM -0700, Erik van der Poel wrote:
>> I don't think that the operator needs to test all possible paths for
>> the label. The operator should just check against the parent (and the
>> parent's parent, and so on). Is this reasonable? Or are there
>> situations where the operator cannot know the normal parent?
>
> I've been thinking about your latter question, and if I understand it
> correctly I think it suggests a serious misunderstanding of the nature
> of the DNS.  That might be the source of some of the confusion in this
> discussion, so I'm going to try to outline exactly what will happen in
> the case I'm thinking about.  If this is something already obvious to
> you (or others interested in this thread), my apologies.
>
> The fundamental problem lies in these terms "the parent" and "the
> normal parent".  The _actual_ parent at a given moment could be
> something other than the typed-in label.  I'm not sure what you mean
> by "normal parent", but you likely mean "the next label up in the DNS,
> as typed in by the user or otherwise delivered by the application".
> Given that the identification of this label is in fact what's at
> issue, such a term is faintly circular anyway.  But supposing that we
> could identify it, it doesn't help, because that label might not
> form part of the FQDN that's actually the parent.
>
> In a DNAME substitution, the substitution happens at any point in the
> processing of RFC 1034 section 4.3.2, step 3.  As you match down,
> label by label, if you get to a node that is found to own a DNAME RR,
> the DNAME substitution occurs.  (If this is confusing, there's a nice
> clarifying chart at Table 1 in draft-ietf-dnsext-rfc2672bis-dname-14:
> http://tools.ietf.org/html/draft-ietf-dnsext-rfc2672bis-dname-14).
> Note that this can happen more than once, which means you can have
> chains of CNAMEs and DNAMEs.
>
> Now, to return to the example case we were discussing,
> <ALEF>.example.com is perfectly safe.  The problem arises, however,
> when example.com is a DNAME to 3.com.  <ALEF>.3.com is problematic.
> This has a few consequences:
>
> 1.  If we're serious in suggesting that the example.com operator needs
> to perform the BIDI test on any IDNA registration, then the operator
> presumably needs to perform the test when converting example.com to
> 3.com via a DNAME.  Of course, in this case the check will be on
> already existing delegations underneath example.com, and if something
> fails the test either the DNAME is not allowed or the delegations have
> to be removed before the DNAME is inserted.  Now, since delegations
> have to appear on both sides of a zone cut, this is possible to check,
> but it may entail a pretty significant effect.

My guess is that you meant "effort" here.

> Moreover, in such a
> case we are putting a limitation on what a zone operator may do with
> their zone _every time_ they make a change of this sort, even if
> they're not worrying about IDNA (they might well not know that there's
> an IDNA below them.  Remember, the zone operator adding a DNAME will
> quite probably have no easy way of telling that there's a BIDI U-label
> below the target label: it's in the zone as an A-label).  This gives
> the lie to the suggestion that this is a step that can be done
> "pre-DNS", as I suggested in my note the other day.

I still wonder whether it is absolutely necessary for the operator to
test all possible resolution paths whenever a label is registered or a
CNAME or DNAME is defined. You're right that one would end up with a
system that allows DNS resolution of FQDNs that do not obey the
current bidi draft when converted to U-labels, if the operator
introduces a "bad" label, CNAME or DNAME.

What is the actual objection here? Do you object that IDNAbis-aware
apps would refuse to pre-resolve some U-label FQDNs corresponding to
A-label FQDNs that otherwise function perfectly within the DNS itself?
If so, would you be satisfied if the IDNAbis rule says instead that
apps are allowed to resolve such FQDNs, but are not allowed to
*display* them in Unicode (since they are ambiguous and confusing
according to the bidi draft principles)?

Either way, I think we need to specify the rules for registration. For
example, we might say that a registrant must specify the entire FQDN
for the label being registered. Then we might specify that the
operator must attempt to resolve that FQDN, and that if any CNAMEs or
DNAMEs are encountered along the way, then the registrant's original
FQDN is "rewritten". When all of the CNAMEs and DNAMEs in a chain have
been processed, we end up with the final FQDN. At this point, the
operator must convert to U-labels (if any A-labels appear), and
perform the bidi test across the entire FQDN. If there is a failure,
then the operator must report the entire final FQDN to the registrant,
so that s/he may have some idea of what happened.

How does that sound?

An alternative would be to allow such registrations, as long as the
original FQDN specified by the registrant obeys the bidi rules. In
that case, the operator MAY warn the registrant about the final FQDN,
but is not obligated to do so. This does not feel very clean since we
already know at that point that "bad" FQDNs are possible in that
setup.

> 2.  If we're serious in suggesting that you have to perform these
> checks pre-resolution, then what we are saying is that applications
> first need to perform all the resolution above the target QNAME,
> perform the BIDI test on the resulting "real" FQDN, then finally hand
> the tested A-label through to gethostbyname().  This sounds like a lot
> of DNS work for something that's supposed to be happening
> pre-resolution.

My assumption was that the IDNAbis bidi draft was only specifying the
rules to follow *before* calling gethostbyname(). I believe that the
old IDNA2003 design was also intended to have apps process the string
*before* putting it in an "IDNA-unaware domain name slot", which is
basically what gethostbyname() is.

> And anyway, as I've already suggested, making the BIDI test mandatory
> automatically makes wildcards illegal for any domain that starts with
> a digit.  Since a wildcard *.3.com would work for <ALEF>.3.com, the
> wildcard ought to be prohibited by the registration rule.

I suppose wildcards could be prohibited at such sites, but even if
they were allowed, problems only arise when someone generates an FQDN
that disobeys the rules, and then tries to resolve and/or display it.
So a wildcard at such a site is not entirely useless. It just has
certain restrictions. No?

>> If the FQDN happens to run into bidi failures due to DNAMEs that were
>> not anticipated (nor condoned) by the original registrant, then the
>> lookup fails or the name will not be displayed in Unicode. So what?
>> Eventually, people will learn not to generate FQDNs with such
>> problems.
>
> I think your faith that people will learn about how to operate the DNS
> such that all the foot-guns remain unloaded is not entirely supported
> by the historical evidence.  But more importantly, there's no reason
> that a delegated zone _should_ be able to rely on some parent zone not
> using a DNAME.  The child has no authority over the parent.  It might
> not be too surprising to find one day that bigcorpexample.com has been
> DNAMEd to 3gimundomergedcorps.com without anyone checking that none of
> the regional offices will go off the air because of it.  But they'll
> sure be annoyed when it happens.

Well, I believe we're stuck between a rock and a hard place. On one
side, we have DNAME, which, if used carelessly, can result in FQDNs
that are displayed ambiguously by the Unicode bidi algorithm. On the
other side, we have RTL characters that we would like to use in domain
names, in such a way that their display is unambiguous even in running
text. It's pretty clear that we cannot stop people from using DNAMEs.
But it's also quite clear that we must allow RTL characters in domain
names if we're going to allow other non-ASCII characters too. Finally,
it's clear that bidi strings are most often displayed using the
Unicode bidi algorithm.

We cannot change that algorithm, but we might be able to work around
it using bidi overrides (LRO and RLO), which get rid of the ambiguity.
I don't know whether the WG members like that idea though. We might
want to list the pros and cons of such a proposal.

Erik