prohibiting previously mapped and unmapped characters

John C Klensin klensin at
Wed Nov 29 23:02:27 CET 2006

--On Wednesday, 29 November, 2006 11:21 -0800 Erik van der Poel
<erikv at> wrote:

> Some members of the design team may have made such
> assumptions, but I only have the Internet Draft to look at:
> es-00.txt

I will probably try to rev that last next week -- specific
suggestions about how things could be made more clear would be

But, to explain Mark's comment in a little more detail (and from
the perspective of one of the guilty parties), the experience
with "take almost any Unicode character, and if it doesn't
belong in a domain name, try to map it to one that does" has
been fairly terrible, confusing users and registrars alike and
adding to the number of opportunities available to various bad
people.  We have seen most of the relevant parties develop
workarounds, many of which ultimately involve a variation of
displaying ToUnicode(ToASCII(string)) rather than (string).  The
theory is that the mapped strings aren't actually in the DNS
coding, so they should not be missed.  

A variation on the same logic extends to alternate presentation
forms that are both assigned Unicode code points and treated as
compatibility equivalents.  We don't want to prohibit them in
user interfaces --that would be stupid and no one would pay any
attention to us-- but to take a stronger position about what has
to be provided as input to the IDNA.   If Unicode and IDNA
supported a Slobbovian Left Squiggle, and a localized Lower
Slobbovian UI wanted to treat a Slobbovian Inverted Left
Squiggle, which was not valid for IDNA, as equivalent, then they
should do so... and should do so whether or not the Slobbovian
Inverted Left Squiggle is a Unicode character that is treated as
a compatibility variant of Slobbovian Left Squiggle.  And they
likely will do so regardless of what we have to say.  

On the other hand, a canonical URL that is going to be passed
around the Internet really needs to use target characters (the
ones that translate into and out of IDNA without change), not
their compatibility equivalents.   The only probably-appropriate
exceptions are case-mappings.  You might think of them as
IDNA-compatibility characters, rather than Unicode compatibility
characters.  And that principle about canonical URLs is true
today, with no changes in IDNA: it has to do with maximizing
interoperability a the UI level and maybe with the robustness

Note that neither of the changes selected above makes any change
to the number of names that can be registered or what those
names are.  They only impact the number of different ways in
which those names can be written.

The other large set of characters that are likely to be
prohibited by this process are a collection of symbols that are
simply not used in writing languages.  They were discouraged by
the "IESG Comments", have been prohibited by ICANN Guidelines
since the beginning, Unicode has generally excluded them from
its concept of "identifiers".  So, if some registry has decided
to make an extra few <insert currency> by selling them, few of
us should have much sympathy.
> Note the "clicking on a URI" in section 2.2.1 and the "label
> rejection" in 2.2.3. Also note the "No" next to FF00..EF on
> page 18
> of:
> bles-01.txt

That table is changing rapidly, much more rapidly than there is
any hope of updating the I-D.  I believe there were days last
week on which it changed more than once.  I believe the link to
the most recent version is

And, has been pointed out on the list, even that table is an
exercise and work in progress, not, in any sense, a final

> Am I misinterpreting the Internet Drafts? Also, I am not only
> concerned about what can actually occur on the wire in a DNS
> packet. I
> am also concerned about the html that goes on the wire. Am I
> alone in this concern?

You are certainly not, but see the comments about user
interfaces above.


More information about the Idna-update mailing list