Mappings (was: Re: High-level changes from IDNA2003 in the "current work")

John C Klensin klensin at jck.com
Fri Mar 7 15:52:51 CET 2008



--On Friday, 07 March, 2008 07:30 +0800 James Seng
<james at seng.sg> wrote:

> I am curious about (c)....wouldn't take give raise to
> inconsistency of results as implementations varies?

James,

Partially because the prohibition in 3490 on display of punycode
didn't work (some browser vendors now view display of punycode
in varying circumstances as a feature) and because various
implementations are already doing some local mapping in URIs and
IRIs (i.e., mapping that is not specified in IDNA2003), that
inconsistency already exists.   It is aggravated by a user-level
inconsistency for both end users and registrants: other than
computers and a few patient experts, no one actually understands
what characters are, and are not, accepted by IDNA and there is
continuing confusion about what "registration" means and whether
one can register a string that cannot come out of
ToUnicode(ToASCII(string)).  Different registries have different
policies on the latter.  A few registrars have concluded that
IDNs are so confusing and problem-prone that they don't want to
touch them.  Those comments are certainly anecdotal rather than
definitive and certainly do not represent checks all the way
down the DNS tree, but they are, I believe, symptomatic of a
broader problem.

In addition, glyphs in fonts seem to be available more often for
base characters than for compatibility ones.   While
applications can sometimes tell which scripts are supported
locally, particular characters within fonts are much harder.  In
general, the application can only assume that the character are
there and try to display them (typically resulting in question
marks or little boxes if the characters are not available, but
sometimes in mapping to other, similar, characters).  Or the
application much assume that display is not possible and display
the punycode-encoded string instead (see above).

The lack of full support for compatibility and other mapped-out
character should not be surprising, but display as little boxes
or question marks is the worst possible case, since information
is actually lost and copy-and-paste are even less likely to work
than usual.

We either need to do something to clean that up (or, of course,
we could decide we like the warts).  The approach taken in the
proposed documents is that we need to minimize variation in
domain names in interchange and --whether you look at it as part
of the goal or as an effect-- make the equivalents of the
ToASCII and ToUnicode operations reversible.  We believe that
will significantly reduce confusion and significantly improve
interoperability.

Once the mappings are removed from IDNA, there are several
possible approaches:

	(i) Do mapping externally to the IDNA protocol set,
	possibly using a standardized model that preserves
	complete compatibility with IDNA2003.   This at least
	gets us clarity about what can be registered and what
	goes onto the model.  It also gives us the potential for
	different rules for different protocol contexts, which
	might be either an advantage or a disadvantage.
	
	(ii) Do as little mapping as possible except in contexts
	where backward-compatibility is more important than
	cleaning things up.  Given the comments above and
	earlier discussions on this list and in
	draft-klensin-idnabis-issues, this might be the best
	approach going forward, and probably would have been the
	best approach if we were starting from a clean slate.
	Whether it is wise or not  today depends on what we
	think about the importance of IDNs that are in active
	use now using characters that map out versus the much
	larger number of IDNs that may exist in the future.
	
	(iii) View mapping among Unicode characters as a
	completely local matter, just as we have always viewed
	mapping into Unicode from local character sets and
	codings.  This requires strong "it better not leak if
	you expect it to resolve" constraints (which we have
	today in different form), but is consistent with the
	knowledge that some local mappings are inevitable as
	application implementations attempt to compensate for
	perceived inadequacies, vis-a-vis their script or
	writing systems, in either IDNA or Unicode.

There are probably some hybrid possibilities as well, but the
proposed documents are, deliberately, completely agnostic on
this subject (draft-klensin-idnabis-issues may not be quite
clear about that in the current version -- I've learned a bit
more about how to explain it in the few weeks since the most
recent version was posted).

best,
   john

p.s. While I think Paul's summary is a useful way to get people
started on coming up to speed, I hope no one will consider it a
substitute for reading the documents, even between now and the
BOF.



More information about the Idna-update mailing list