UTF-8

Dave Thaler dthaler at microsoft.com
Sat Jun 19 03:38:20 CEST 2010


> -----Original Message-----
> From: idna-update-bounces at alvestrand.no [mailto:idna-update-
> bounces at alvestrand.no] On Behalf Of Andrew Sullivan
> Sent: Friday, June 18, 2010 2:33 PM
> To: Nicolas Williams
> Cc: idna-update at alvestrand.no
> Subject: Re: UTF-8
> 
> On Fri, Jun 18, 2010 at 03:38:19PM -0500, Nicolas Williams wrote:
> > thing.  However, given that no one is supposed to send non-ASCII (and
> > in some cases non-LDH), it's always possible that some implementors
> > could
> 
> No, this is what I'm trying to explain you're mistaken about.  A valid DNS query
> can include just about any octets -- octets, note, not multibyte characters -- you
> like.  Given that a particular octet might be one "character" in UTF-8, a different
> "character" in ISO-8859-1, a still different "character" in IS)-8859-5, or might be
> none of the above, it is simply wrong to make assumptions about what queries
> include in this way.

Right (unless you have some special situation where you somehow have certain
knowledge of the character set, e.g. because it's a completely closed environment
and you have knowledge of all the queriers).

> 
> > IDNA is unavoidable, so there's little point in bothering to use
> > non-ASCII on the wire in DNS.
> 
> Yes.  On this we agree in the public DNS.  But the point Dave especially was
> trying to make, I think (and I don't want to put words into his mouth), is that if
> you think you can make that assumption generally today, you are simply wrong.
> Too bad, so sad, but that's the way it is.

Yes, anyone who assumes that non-ASCII doesn't appear in private DNS 
namespaces is simply wrong.

> 
> > In this context I don't care what "longtime IETF participants" think.
> > I care what the middleboxes do.
> 
> Never mind middleboxes (because yes, they make life complicated).
> Think just about applications that think they're doing reasonable DNS sanity-
> checking.  There are _still_ places I can't put my .info email addresses, because
> of some heuristic that no TLD is longer than 3 characters.  We're not even into
> the interesting problems, and we're already broken.
> 
> If I've read you right, you want to say what people MUST do.  I think that's a
> mistake.  I think the best advice is to say that certain practices maximize interop,
> that using either of A-label or U-label (with adequate type checking) will work,
> and that it would probably be best to settle on [pick one.  I prefer A-label, but I
> don't care that much because they're freely convertible].  If the latter is what
> you've been saying, I am sorry that I so badly misunderstood.

Yep.  Except unfortunately in today's world the choice of A-label or U-label
that maximizes interop depends on the namespace (A-label in public DNS, 
U-label in many private namespaces).

-Dave

> 
> A
> 
> --
> Andrew Sullivan
> ajs at shinkuro.com
> Shinkuro, Inc.
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update



More information about the Idna-update mailing list