Distributed configuration of "private" IDNA (Re: IDNA and getnameinfo() and getaddrinfo())

Thu Jun 17 03:28:34 CEST 2010

--On Wednesday, June 16, 2010 16:54 -0500 Nicolas Williams
<Nicolas.Williams at oracle.com> wrote:

>...
> So, to resolve tést.{foó, foóbar, óther}.example. the
> _resolver_ would first have to split the input string into
> labels using whatever fullstops are legal in the current
> locale, then lookup each of those domains' IDNA rules in the
> example. TLD zone, do whatever codeset conversions and
> pre-processing may be required to meet the rules found, then
> do the next query.  And so on.

Well, remember that, if fullstops are not global, one needs to
be very careful to keep local ones from leaking.  If they do
leak, a parser that tries to separate an FQDN into labels will
end up with a high error rate.  That would make the bad guys,
who have lots of fun with URLs that trick users into believing
that third- or fourth-level names are really second-level ones,
very happy.  I trust their happiness is not our goal.

> Sounds good, BUT there's issues w.r.t. stub resolvers and
> caching: stub resolvers suddenly have to get pretty fancy,
> even if the are using caching servers, because suddenly
> recursive caching servers are not useful for looking up IDNs!

Right.  And, if you start thinking about DNAME and other things
that prevent you from knowing definitively which tree someone
thinks that a name/label is in, the difficulties with caching
servers start looking easy.   Remember that there is not even an
inherent DNS restriction that would prevent having a label in a
private namespace for a DNAME RR whose Data points into the
public one DNS. 

> Makes you think that private DNS clouds with IDN rules other
> than IETF Standards-Track IDNA rules are not desirable.  And
> I'd agree.
> 
> What's the point of this post?  First: to note that private
> DNS clouds with non-standard IDN rules are a big PITA since
> right now they can only be supported by nodes that either
> happen to implement those rules (and not IDNA) or which have
> local configuration partitioning the DNS namespace by IDN
> rulesets, and distributed configuration, though it could be
> possible, would also be a PITA since stub resolvers would have
> to get pretty smart.  Second: to outline a meta-IDN system
> that could work if IDNA2008 should founder (but let's hope
> not).  Third: I had to write this down :)

I think there may be a fundamental misunderstanding here.  If
your point is that we have a mess on our hands, we already know
that... and that is starting point for this document.

Could the mess have been avoided if the implications of the
native UTF-8 (and other native encodings, such as direct use of
8859-1) had been known and analyzed when the IDNA work was being
done?  Well, perhaps, but actually I have serious doubts.  The
public-DNS TLDs that were selling 8859-1 names prior to IDNA2003
really didn't care -- they were in the name-selling business
and, if some of those names weren't able to be used in
applications... well, buyer beware.  The decision to wrap IDNA
around an ACE was made fairly consciously and with a moderately
good understanding of what we were getting into.  If we had
understood that better, or made different tradeoffs, the answers
might have come out a little different but I don't think very
much.  And, while the Punycode algorithm and encoding takes the
heat in the current draft, it is difficult to understand how any
other ACE encoding would have been much better.

Now, this particular mess could have been avoided almost
entirely had the IDN WG decided to use UTF-8 in the DNS instead
of going through Nameprep and an ACE.  The WG decided to not do
that, partially because it, perhaps unlike some of the private
implementations that are now using UTF-8 directly, understood
that user expectations and matching issues required
normalization and careful attention to matching procedures and
that getting the DNS to do that and applications to accept it
would result in a _very_ long implementation and deployment
curve.  And the WG decided that deployment time was important
and that a long time before general availability was
intolerable.  Real tradeoff there.

Note that one of the advantages private namespaces have over
public ones is that they are typically fairly homogeneous wrt
software, management, or both.   If I know that all names will
be canonicalized in the same way and that they will be used
within a single, homogeneous community, I may not need fancy
normalization and matching rules built into a protocol.  Indeed,
I may never notice the absence of that machinery.

But, if we had a situation in which the public namespaces were
using IDNA2003 UTF-8 strings, and the private ones were using
unmodified/ unmapped UTF-8 strings, we would still have a
problem because we could get false matches in both environments
depending on the assumptions made.  That problem gets a lot
better if everyone is using U-labels.  Of course, that was a key
reason why IDNA2008 doesn't have mapping in the protocol but, as
I trust everyone reading this knows, that decision is not
without problems in practice... and it is precisely where these
traditionally-different approaches interact that the collision
between theory and practice gets most severe.

One more recent set of decisions is reminiscent of the IDNA
ACE/Punycode one.  If there were no IDN TLDs and, preferably, a
very small and infrequently-changing number of TLDs total, then
it would be fairly easy to devise ways to distinguish between
UTF-8-using private namespaces and A-label-using public ones.
ICANN has not seemed to be very interested in that issue and the
tradeoffs it implies.

In this context, Shawn wrote:

> The good thing about Punycode/IDN is that it enabled DNS.  The
> bad thing is that suddenly any network app needs to become a
> DNS expert.

Borrowing a theme from another discussion that has been going on
in parallel, the good thing about getnameinfo and getaddrinfo
are that they enable IPv6.  The bad thing is that suddenly any
network app needs to become a routing preferences expert.   As
Ned Freed pointed out in that context, if you really want this
to be transparent to the application, the relevant interface is
some flavor of "SetupConnectionByName" with which the
application starts with an opaque name and then, subject to some
parameters or function-name variations, ends up with a
connection.  Sadly, taking away the need for expert knowledge of
the DNS alone really doesn't help a lot.

   john

I suggest that, ultimately, the main purpose of the encoding
document is to identify the problem(s), warn people to exercise
caution, and to make a few suggestions that may help a bit.