UTF-8

Fri Jun 18 22:38:19 CEST 2010

On Fri, Jun 18, 2010 at 02:06:25PM -0400, Andrew Sullivan wrote:
> On Fri, Jun 18, 2010 at 12:37:23PM -0500, Nicolas Williams wrote:
> > I don't think you're being creative enough :)
> > 
> > [...]
> > exploding zone file size).  But a DNS server that implemented case- and
> > normalization-insensitive UTF-8 matching would be indistinguishable from
> > a dumb server serving such zones.
> 
> It'd also be violating the matching rules in STD13, as far as I can tell. 

Not at all: if you can't distinguish between a server with a zone whose
RRset names are "exploded" in this way and a DNS server that logically
does by implementing equivalent matching rules, then you can't say that
the server violates the standard.

> > Indeed, if _I_ were developing a DNS server I'd provide an option to
> > treat A-labels, U-labels and raw UTF-8 equivalent names as equivalent.
> 
> I don't know what this means.  How would you know that something was a
> raw UTF-8 label?  All you get is a bitstream.  You can't tell what
> [...]

That's not relevant to whether such a DNS server violates any standard.
I believe I've disposed of _that_ question, whereas you merely assert
the opposite answer.

Of course it's possible that some nodes will send something that's
neither UTF-8 nor ASCII, in which case you might get NXDOMAIN or codeset
aliasing.  And I'm sure we'll all agree that codeset aliasing is a bad
thing.  However, given that no one is supposed to send non-ASCII (and in
some cases non-LDH), it's always possible that some implementors could
"take" the unused bit and just-send-UTF-8 in the hopes that that
approach would "win", resulting in ultimate acceptance that non-ASCII
must be UTF-8.  In fact, at least one implementor has clearly done that,
and if they'll go that far, my suggestion to them is that they might as
well implement matching rules right on the server, EVEN THOUGH I might
not condone what they've done (I don't condemn it either at this time).

The one technical issue I've glossed over in this sub-sub-thread is what
to do about canonical names in such a DNS server (as in the RDATA for
CNAME, PTR, SRV and other RR types).  IMO, regardless of the matching
rules, the canonical names had better use A-labels, as that's the only
way to guarantee interop.  Which (along with the try-UTF-8-first-then-
IDNA "negotiation" algorithm described by Shawn and Dave) leads me to:
IDNA is unavoidable, so there's little point in bothering to use
non-ASCII on the wire in DNS.

> > I assume you mean middle-boxes (caching servers) that aren't 8-bit
> > clean.
> 
> And the fact that even longtime IETF participants don't always make
> the careful distinction between hostname and domain name, never mind
> people who weren't around when the distinction was one you could
> actually see.

In this context I don't care what "longtime IETF participants" think.
I care what the middleboxes do.

> > But again, for a private namespace that's probably not a problem.  And
> > it's probably not a problem at all, whether in private or public
> > namespaces.
> 
> Ah, yes.  Because we all know that them gardens stay behind their walls.

If you read what I've written, I'm rather concerned about how the system
described by Shawn Steele and Dave Thaler interops, if at all.  Since we
have IDNA, I'd rather stick to that.

Nico
--