exact match vs mapping

John C Klensin klensin at jck.com
Mon Mar 30 23:00:22 CEST 2009

--On Tuesday, March 24, 2009 17:17 -0700 Erik van der Poel
<erikv at google.com> wrote:

> Hi all,
> Thanks for the meetings. After the 2nd meeting, Pete Resnick
> and I discussed a "layer" model that is probably already
> familiar to most of us, but it raises an interesting question
> that I thought I'd pose to the mailing list.
> We have often talked about protocol "stacks" where e.g. HTTP
> sits on top of TCP, which sits on top of IP, and so on. In our
> IDNA discussions, we have often talked about the HTML stack
> and the email stack. If we take these stacks to their logical
> extreme, they would include the human user at the top:
> human user
> email app
> message body
> 822 header
> SMTP envelope
> IP
> This is a very rough description of the stack, and I realize
> that SMTP goes back and forth between client and server, but,
> I hope you get the general idea. So far, my assumption has
> been that SMTP extensions would probably want to use U-labels.
> I have no idea what people are thinking for the 822 header.
> (John?)

The base SMTP and 822 protocols don't allow non-ASCII
characters, so the answer is "A-labels".  The internationalized
extension work is still experimental and hence very subject to
change as far as the domain-part is concerned (the local-part
follows precedent by being exact-match).

> The Web stack might look like this:
> human user
> Web app

In terms of standards (no matter how much they are ignored in
some quarters), HTML prior to the still-under-development HTML5
requires URIs (and hence A-labels).

> IP
> Now, one of the issues with IDNA2008 is whether or not to
> include mapping as a MUST. Of course, one way to do this is to
> have a separate RFC for mapping, and have the main IDNA
> protocol refer to the mapping spec, saying that the mapping
> must occur "somewhere" in the stack above. It sounds like some
> of the WG members would like to "push" the mapping all the way
> up the stack to the app (in the UI, immediately after keyboard
> or other entry).

A variation, which I'm finding increasingly interesting for
other reasons, is to consider mapping part of the IRI/URI
boundary, thereby permitting it to be different for different
protocols if that is useful (and it may be).

> But we have also talked about "getting the user used to
> lower-case in the DNS" (by displaying in lower-case, etc). So
> my question is: What is the goal of IDNA? Is it a goal to have
> software map non-ASCII characters to lower-case to simulate
> traditional DNS behavior with ASCII strings? Or is it a goal
> to teach the user to enter lower-case in the first place
> (effectively pushing the lower-case mapping all the way up to
> the human brain)?

I don't think either of those is the goal.  I think the goal is
to permit useful mnemonics for network resources in a wide range
of scripts.  To me, "useful mnemonics" means as much flexibility
as possible without compromising the utility or integrity of
identifiers or the contexts in which they are embedded.  And I
consider things that create confusion among users --including
having things floating around that seem to match but don't and
vice versa-- to compromise the utility and integrity of those

The questions you raise above are, IMO, about tradeoffs for
realizing that goal, not goals in themselves.  Remember too
that, if the best thing for the user is that anything she
expects to match should match, then we really need mapping rules
that cause decorated versions of some characters to match the
undecorated versions, at least sometimes (and a way to evaluate
and determine "sometimes"), not just case mapping to
most-nearly-related character.


More information about the Idna-update mailing list