Basic IDN assumptions (was: Re: The Future of IDNA)

Fri Mar 20 22:44:25 CET 2009

--On Friday, March 20, 2009 09:40 -0700 Lisa Dusseault
<lisa.dusseault at gmail.com> wrote:

> Hi John,
> 
> On Thu, Mar 19, 2009 at 10:37 PM, John C Klensin
> <klensin at jck.com> wrote:
> 
>> 
>> (3)  Whatever one does with IDNA, it can't be protocol
>> specific, optimized to work with the web and less
>> satisfactory, or relying on different mechanisms, for other
>> applications.

> You identify this as a basic assumption.  Can you elaborate?

>From your comments below, I obviously was not specific and/or
clear enough.  I hope that what follows will help.  First, let
me try to restate the above...

IDNA is designed as a DNS add-on to permit internationalization.
As a DNS add-on, IDNA (with a given prefix, whatever that is)
cannot have different interpretation rules for labels that
depend on the applications that they are called from, nor can
the DNS detect the application protocol that is asking a
question.  

None of that prevents some piece of application-type-specific
protocol between an application and whatever reaches IDNA from
handling _its_ input in ways that are different from the way a
different application might do it as long as what goes into or
comes out of IDNA is constant.  

It was exactly that option that caused me to start thinking
about the IRI -> URI interface.   Even if the URI -> IDNA
interface was protocol-independent and either closely tied to
IDNA's specifications or using A-labels exclusively and
therefore not invoking IDNA directly at all (not actually
necessary, but probably desirable), one could, in principle,
redefine the IRI-> URI interface to be protocol-dependent and,
in particular, do different types of mappings for, e.g., "http"
and "mail" protocol types.  

Whether that would be wise or not is a separate question but it
is clear to me that, architecturally, it would be possible to do
things with specific protocols at that layer that are not
possible with IDNA itself (partially because an IRI-> URI
interface is able to know what the protocol is in a way that
IDNA-> DNS interfaces are not).

Now...

> HTTP is special due to its scale, dependence on URLs in HTML,
> its server control over display.  Email and, to a lesser
> extent, any popular federated communication tool, is special
> too -- getting so many new domains in address headers from one
> home server that accepts message delivery.

Right.

> Why couldn't we do addons to the Web or email infrastructure
> that helped users deal with IDNAs?  Isn't that kind of work
> very likely at some point -- e.g. how to do searches and
> filters on an IMAP server where IDNA hostnames might appear in
> addresses?

Indeed.  I've been thinking about doing those operations a
fractional layer or two below what the paragraph above suggests
to me, but my point was "not in IDNA" not "not at all".    My
conceptual model at the moment is that we have, approximately....

 Actual application (web browsers, email clients,...)
 *  -> Application protocols and data formats 
 *           (HTTP, SMTP, HTML, XML, Mail Format stuff)
 *     -> Application-facing (and internationalized)
 *           identifiers (presumably IRIs)
 *        -> IRI-URI decoder
 *        -> Network-facing identifiers (URIs)
 *           -> URI decoder
 *           -> IDNA
 *           -> DNS

What I was thinking about was trying to redraw that picture so
that the bottom four categories would, in terms of information
refinement, be... 

 *     -> Application-facing (and internationalized)
 *           identifiers (presumably IRIs)
 *        -> IRI-IDNA interface
 *           -> IRI-URI decoder
 *           -> IDNA
 *           -> Network-facing identifiers (URIs)
 *               -> URI decoder
 *               -> DNS

with the IRI-> IDNA interface doing protocol-specific operations
before calling on IDNA and then passing A-labels into the URIs.

You can draw the pictures in many other ways, probably more
clearly, but perhaps that makes the general idea clear (and it
is strictly a general idea, not a proposal).

More important, what I was trying to say is that the part of
this that is identified in both pictures above as "IDNA" cannot
be protocol-dependent, any more than the part that is labeled
"DNS" can.

> Further, the kinds of things we might imagine doing in Web
> server info files or in IMAP searches might have an effect on
> what we do in IDNA now.

While I agree in practice, that is a topic I'm trying to steer
away from just because, if we block IDNA while we are imagining,
the degree to which we have been bogged down for the last year
will look really speedy.  I'm pretty sure you didn't intend
that, but some of the postings from others have seemed to lead
in that direction.

> I wonder if the confusion is in "whatever one does with IDNA",
> if you mean something specific with that or really "whatever".

Something very specific to what is done inside the IDNA protocol
itself.

Does that help?

     john