UTF-8

Nicolas Williams Nicolas.Williams at oracle.com
Sat Jun 19 01:49:29 CEST 2010


On Fri, Jun 18, 2010 at 05:33:00PM -0400, Andrew Sullivan wrote:
> On Fri, Jun 18, 2010 at 03:38:19PM -0500, Nicolas Williams wrote:
> > thing.  However, given that no one is supposed to send non-ASCII (and in
> > some cases non-LDH), it's always possible that some implementors could
> 
> No, this is what I'm trying to explain you're mistaken about.  A valid
> DNS query can include just about any octets -- octets, note, not
> multibyte characters -- you like.  Given that a particular octet might

You say that as if I'd said or implied otherwise.  But I've not.

> > In this context I don't care what "longtime IETF participants" think.
> > I care what the middleboxes do.
> 
> Never mind middleboxes (because yes, they make life complicated).
> Think just about applications that think they're doing reasonable DNS
> sanity-checking.  There are _still_ places I can't put my .info email
> addresses, because of some heuristic that no TLD is longer than 3
> characters.  We're not even into the interesting problems, and we're
> already broken.

I don't care to broaden the scope of what we should concern ourselves
with to include Joe Sixpack's website's user registration form that
"validates" domainnames.

> If I've read you right, you want to say what people MUST do.  I think
> that's a mistake.  I think the best advice is to say that certain
> practices maximize interop, that using either of A-label or U-label

Of course we must say what implementors MUST do if we want interop!  In
fact, we couldn't have interop if we didn't.  (Well, we can always
interop with non-IDNs, but interop with IDNs is exactly the point here.)

If there's any place where we can't decide what an implementor MUST do
to achieve IDN interop then it must be the case that either we don't
have enough information or we can't reach consensus on one approach or
another because there is no common approach that wouldn't be costly to
some significant part of the deployed base's users.  And even then
nothing stops us from reaching consensus on some requirement eventually.

> (with adequate type checking) will work, and that it would probably be
> best to settle on [pick one.  I prefer A-label, but I don't care that
> much because they're freely convertible].  If the latter is what
> you've been saying, I am sorry that I so badly misunderstood.

I think every domainname slot has to be taken on a case-by-case basis.
We already have IDNA rules for a number of domainname slots.  Some slots
we'll be able to declare as IDN-aware and we'll be able to allow
U-labels and even raw Unicode in them (because the receipient will be
able to apply ToASCII() and/or ToASCII(ToUnicode()) as necessary.  Some
slots we'll have to declare as IDNA-unaware and restrict them to
A-labels [if we want IDN interop].

Nico
-- 


More information about the Idna-update mailing list