Some confusion about policy application and its effects

Tue Aug 5 00:15:42 CEST 2008

--On Monday, 04 August, 2008 15:28 -0400 Andrew Sullivan
<ajs at commandprompt.com> wrote:

> I'm sorry I'm not being terribly coherent about this; I think
> the problem is partly that I really am confused.  Perhaps
> "layering violation" is the wrong way to think about it.  It's
> more like a leak in between the application and presentation
> layers: whether a "unicode label" is "allowed" seems almost to
> be an emergent property of the interaction among whatever the
> local policy is, IDNA itself, and the policies that determine
> registration rules at the registration end of the DNS.  (Maybe
> this is exactly what Mark Davis has been getting at in his
> expressed worries about stable mappings, and I'm just
> expressing it half as well because of a poorer understanding?)

Andrew,

At least part of the source of your confusion is that I wrote
the relevant text badly, in a way that makes it sound much more
permissive than I intended.  I'm working on that now; you should
expect updated drafts of both Protocol and Rationale later this
week.

However, I also think that your earlier note reflects a
misconception about the actual difference between IDNA2003 and
IDNA2003 in practice (as possibly distinct from "in theory if
everyone behaves themselves" or, even more important, "in theory
is everyone behaves with IDNA2003 but goes hog-wild with
IDNA2008".  The latter is where there is a real potential for
problems, except that believing that people who were careful and
thoughtful with IDNA2003 will suddenly go crazy with IDNA2008
stretches the limits of my credibility at least.

In particular, you wrote,

> In IDNA2003 use, we have applications that call for resolution
> using a specially-formatted name that otherwise does not
> perturb the traditional use of DNS at all. [...]
>  All of these restrictions, however, are rules that are
> imposed at the registration side.  Applications still just get
> a result back, and use that. 

Well, an application is still required to take whatever string
they find or are handled, pass it through Nameprep and a series
of other transformations and checks, and map the result through
Punycode to what we now call an A-label before trying to look it
up.  Some of those transformations try to simulate --more
successfully for some scripts and uses than others-- the
case-insensitive matching that is server-side requirement for
ASCII labels in the DNS.  With IDNA2003, all of that is done on
the client.  Case-insensitive matching is simulated by case
folding of characters.  And the applications either get that
processing right or they don't.  Mostly, when they intend to get
it right, they do, but sometimes their intentions lie elsewhere
for reasons they consider reasonable.   And, incidentally and at
the risk of administering a few more kicks to a horse that was
adequately kicked last week, there are some strings that are
permitted by IDNA2003 that do "perturb the traditional use[s] of
DNS".

> For convenience and consistency, the
> application may just use Unicode and expect the
> transformations to be supplied by some underlying library
> that's "bolted onto the front" of the traditional resolver. 

Indeed, one of the suppositions about IDNA2003 is that various
application-ish things would gradually slide into something like
gethostbyIDNname (or its getaddrinfo equivalent).  That is, by
the way, where the slightly-sloppy terminology about "resolvers"
came from, IIR.

>  You get a local error
> if you don't have an input label that can be successfully
> transformed according to IDNA2003 rules; but that's not really
> different from a typo where you include a "/" in your
> ASCII-only domain name.

Well, if you include a "/" in your ASCII-only domain name, you
will get a local error out of some applications but not out of
others -- the LDH rule is not universally supported in the
application space.

> But I think it probably breaks the "lookup and use" model that
> underlies the way we talk about using the DNS.  

If you want "lookup and use" for IDNs, you need UTF-8 labels and
server-side matching.   I'm happy to discuss that with you, but
it is explicitly not IDNA and it is even more explicitly out of
scope for this WG.   Anything else is going to be a
slightly-fuzzy compromise.  The only thing that IDNA2008 changes
in this area is the point at which things get fuzzy.

> While the scope of N might be different, an
> IDNA2003-compliant client will perform the same transformation
> to the "Unicode labels" each time.  The display of the result
> will be the same, too (as long as both C1 and C2 are both
> IDNA-aware).

Maybe.  There is no requirement that I can find in IDNA2003 that
insists that the name displayed be the native character form
derived from the A-label result of the query.   Actually, I
don't think there is even such a requirement for the
"traditional" DNS, although the practice is common.   With
IDNA2003 (even) applications have a choice between preserving
the pre-protocol input string and displaying it after getting an
answer or of displaying the string derived from the
answer/result.    And some do.

> As near as
> I can tell, however, the same is _not_ true for IDNA2008,
> because local mappings may change both the input to the
> transformation function and the displayed output after the
> answer is returned.  If C1 and C2 have different policies
> (different locales, for instance?), then at least the meaning
> of "same query" is not clear to me.
> 
> If I'm right about the above, I wonder whether it is a (new)
> layering violation; and if so, whether it's an acceptable one
> in the face of the alternatives.

If there is a layering violation, it is there in both IDNA2003
and IDNA2008 although it is certainly more likely with the
latter.  

More when I've got a draft of the revised text for the documents
ready.

   john