UTF-8

Nicolas Williams Nicolas.Williams at oracle.com
Fri Jun 18 19:37:23 CEST 2010


On Thu, Jun 17, 2010 at 04:14:37PM -0400, Andrew Sullivan wrote:
> On Thu, Jun 17, 2010 at 08:02:18PM +0000, Shawn Steele wrote:
> > And, FWIW, if I were building a name server, I'd let it accept UTF-8
> > requests (They'd have to be U-labels, so the server'd have to use
> > the UTS#46 mappings like any client would, however it wouldn't
> > matter as long as the rules were consistent).
> 
> If you were building a nameserver that way, you'd be doing it wrong.
> DNS is _already_ 8-bit clean, and always was.  It's right there in the
> definition in RFC 1034 and 1035.  _Any_ octet is allowed in DNS
> labels.

I don't think you're being creative enough :)

A DNS zone could easily have N copies of the same RRset, all with
different but equivalent IDNs:

foÓ.example.        IN A 1.1.1.1
foó.example.        IN A 1.1.1.1
xn--fo-6ja.example. IN A 1.1.1.1
...

A server serving such a zone could look as though it knew how to do
pretty smart matching.  AS LONG AS the names of the RRs in the reply
match those in the query.

It would be perfectly fine to produce a tool that generates all the
possible aliases of a label given some set of matching rules.  For
example, if we want normalization- and case-insensitive matching then
there'd be four variants of 'foó'.  Apply such a tool to your zone files
and you can make any dumb DNS server suddenly seem pretty smart!

Of course no one would do _that_ (for every char you could have as many
as a dozen combinations more equivalent labels, resulting in an
exploding zone file size).  But a DNS server that implemented case- and
normalization-insensitive UTF-8 matching would be indistinguishable from
a dumb server serving such zones.

I.e., DNS servers can, in fact, do what Shawn proposes and still be
compliant as long as all servers for a given zone behave the same way.

Indeed, if _I_ were developing a DNS server I'd provide an option to
treat A-labels, U-labels and raw UTF-8 equivalent names as equivalent.
I think that's probably the ideal thing to do in a private namespace
using UTF-8 labels because IDNA-compliant nodes will be able to find
equivalent A-labels, with happy interoperating users as a result.

> The problem is that those aren't allowed in registerable domain names,
> which are subject to hostname restrictions defined outside the DNS.

That's another, separate issue, one that doesn't really apply to private
namespaces.  Shawn's still on solid ground.

> These are really policy matters, and not protocol matters, but owing

Right.

> to a long history, the distincion was not always understood by
> implementers and so we ended up with a lot of rules that were in fact
> policy matters getting enshrined in "protocol" broadly
> (mis)understood.

I assume you mean middle-boxes (caching servers) that aren't 8-bit
clean.

But again, for a private namespace that's probably not a problem.  And
it's probably not a problem at all, whether in private or public
namespaces.

Nico
-- 


More information about the Idna-update mailing list