UTF-8

Andrew Sullivan ajs at shinkuro.com
Fri Jun 18 20:06:25 CEST 2010


On Fri, Jun 18, 2010 at 12:37:23PM -0500, Nicolas Williams wrote:
> > If you were building a nameserver that way, you'd be doing it wrong.
> > DNS is _already_ 8-bit clean, and always was.  It's right there in the
> > definition in RFC 1034 and 1035.  _Any_ octet is allowed in DNS
> > labels.
> 
> I don't think you're being creative enough :)
> 
> A DNS zone could easily have N copies of the same RRset, all with
> different but equivalent IDNs:
> 
> foÓ.example.        IN A 1.1.1.1
> foó.example.        IN A 1.1.1.1
> xn--fo-6ja.example. IN A 1.1.1.1
> ...
> It would be perfectly fine to produce a tool that generates all the
> possible aliases of a label given some set of matching rules.  

Yes.  Over in DNSEXT we appear to be sliding down the slippery slope
of attempting to solve this problem in a more generic way -- to solve
not just these kinds of examples but other sorts of aliasing too.  If
you want to help, we can use it (though I warn you that the cost is
getting a good handle on all the twisty passages that are part of the
DNS both as deployed as specified).

Some have argued very strongly that the only right thing to do here is
solve it entirely with provisioning tools, and to stop trying to make
the DNS provide the information to allow inferences.

(While I'm at it, I also want to point out that I've requested a
special hour for DNSEXT that is aimed squarely at non-DNS weenies, who
want the DNS to do things and are unhappy that "aliases" don't work as
they want.  This is a plain requirements-gathering exercise: you talk,
and we write down.  Then we'll at least be able to say, "Yes, can do
that," to some things and, "Nope, can't, and here's why," to others.)

> exploding zone file size).  But a DNS server that implemented case- and
> normalization-insensitive UTF-8 matching would be indistinguishable from
> a dumb server serving such zones.

It'd also be violating the matching rules in STD13, as far as I can tell. 

> Indeed, if _I_ were developing a DNS server I'd provide an option to
> treat A-labels, U-labels and raw UTF-8 equivalent names as equivalent.

I don't know what this means.  How would you know that something was a
raw UTF-8 label?  All you get is a bitstream.  You can't tell what
encoding it was in.  So what would it mean for these to be equivalent?
You might get this to work much of the time for much of the Internet,
but we're still not quite the Internet Bodge-Up Task Force, and I'd
hate for us to become one.

> > to a long history, the distincion was not always understood by
> > implementers and so we ended up with a lot of rules that were in fact
> > policy matters getting enshrined in "protocol" broadly
> > (mis)understood.
> 
> I assume you mean middle-boxes (caching servers) that aren't 8-bit
> clean.

And the fact that even longtime IETF participants don't always make
the careful distinction between hostname and domain name, never mind
people who weren't around when the distinction was one you could
actually see.

> But again, for a private namespace that's probably not a problem.  And
> it's probably not a problem at all, whether in private or public
> namespaces.

Ah, yes.  Because we all know that them gardens stay behind their walls.

A

-- 
Andrew Sullivan
ajs at shinkuro.com
Shinkuro, Inc.


More information about the Idna-update mailing list