IDNA and getnameinfo() and getaddrinfo()

Dave Thaler dthaler at microsoft.com
Mon Jun 14 22:20:55 CEST 2010


> -----Original Message-----
> From: Nicolas Williams [mailto:Nicolas.Williams at oracle.com]
> Sent: Monday, June 14, 2010 12:32 PM
> To: Dave Thaler
> Cc: idna-update at alvestrand.no; john+ietf at jck.com; cheshire at apple.com
> Subject: Re: IDNA and getnameinfo() and getaddrinfo()
> 
> On Mon, Jun 14, 2010 at 07:14:12PM +0000, Dave Thaler wrote:
> > > Over in the NFSv4 WG we're discussing how to fix NFSv4.1 to properly
> > > handle IDNA.  In the process of doing so I ran into draft-iab-idn-
> > > encoding, which has a cogent discussion of name service switches (pictured
> in figure 2).
> > >
> > > draft-iab-idn-encoding aims for Informational status.  I'm wondering
> > > if we could publish a Standards-Track document describing how
> > > getnameinfo() and
> > > getaddrinfo() should handle IDNA.
> > >
> > > For example, one could say that when using DNS getnameinfo() should:
> >
> > Be careful not to confuse getnameinfo() with DNS.
> 
> I explicitly pointed out the name service switch architecture usually
> implemented.  I thought that'd suffice to clarify that I really meant "the DNS
> plug-in to the getnameinfo() entry point in the name service switch" -- I just
> didn't want to be too redundant.
> 
> > As noted in draft-iab-idn-encoding and RFC 3493, DNS is just one of a
> > number of mechanisms used under getnameinfo().
> 
> Right, and I believe the failure to acknowledge this in the original IDNA
> architecture was a significant failure.  I'm disappointed that though this is being
> acknowledged now, it's not in a standards-track document.
> 
> > >  - perform the DNS lookup
> > >  - apply ToUnicode() to the resulting domainname
> > >  - attempt to convert the address' name to the caller's locale's codeset
> > >    if that codeset is not UTF-8
> > >     - if failure, then return the A-label as the canonical hostname
> > >     - if success return the U-label (in the caller's locale's codeset)
> > >       as the canonical hostname and the A-label as an alias
> > >
> > > And that when using DNS getaddrinfo() should:
> > >
> > >  - convert the given host/domainname from the caller's locale's codeset
> > >    to UTF-8 if necessary
> > >  - apply ToASCII(), perform DNS lookups
> >
> > As discussed in draft-iab-idn-encoding section 3, it's not that simple.
> > The ACE form applies in the public DNS but does not apply in many
> > private DNS clouds.
> 
> I'm not sure I care about those, but one could always implement lists of domains
> below which to apply alternative algorithms.

You may not care about them but unfortunately people who provide 
getaddrinfo/getnameinfo libraries for applications in general need to
care about them.

> 
> I was specifically interested in what name should be returned as canonical and
> what name should be returned as an alias, if any.
> 
> > >     - if success, return the IP address(es) found, the given name as the
> > >       canonical hostname, the A-label form of the hostname as an alias,
> > >       and the U-label form (converted to the caller's locale's codeset)
> > >       as an alias if different from the given hostname.
> >
> > The addrinfo structure returned by getaddrinfo() does not return
> > "aliases" per se.  It can return a single string which is:
> > 	char   *ai_canonname; /* canonical name for nodename */
> 
> Oh, right.  How depressing.  I'd for some reason thought them similar enough to
> gethostbyname/gethostbyaddr().

Unfortunately they're not.

> 
> So remove all references to aliases from my previous post; instead these
> functions should return the A-label as the canon name only when the U-label
> cannot be converted to the caller's locale's codeset losslessly, else they should
> return the U-label (in the caller's locale's codeset) as the canon name.

I'd argue that the "canon name" should be the form in which it was 
resolved over the wire.  So the A-label form if it was resolved in the public DNS,
and another form (typically the U-label form) if it was resolved via something 
else (e.g., mDNS or DNS in a private namespace using UTF-8 or whatever else).
Also note that Windows treats "char *" as ANSI (which has no guarantee of
interoperability) and hence deprecates getaddrinfo/getnameinfo, and defines
UTF-16 versions (GetAddrInfoW/GetNameInfoW).
MacOS on the other hand treats "char *" as UTF-8.  

RFC 3493 doesn't say either way whether "char *" is ANSI or UTF-8 or whatever
else, and as far as I know, neither does POSIX 
(http://www.opengroup.org/onlinepubs/9699919799/functions/getaddrinfo.html).

Hence this is an issue for anyone proposing to make a standards-track RFC for 
getaddrinfo/getnameinfo.

> 
> > In my view, yes you're on the right track in having NFSv4 not want to
> > do encoding conversion itself for name resolution but in expecting it
> > to be done under getaddrinfo/getnameinfo.
> 
> Would more advice to protocol designers be appropriate then?  When should
> application protocols (ingoring domainname registration related
> protocols) care to specify A-labels-only, U-labels-only, both, or un-pre-
> processed Unicode?
> 
> If we could assume IDNA-aware getnameinfo()/getaddrinfo() then is there any
> reason for application protocols [that don't involve domainname registration] to
> do anything other than allow all three forms (A-label, U-label and un-pre-
> processed Unicode) on the wire?

I'd argue any new application protocol ought to specify the encoding rather than 
allowing multiple.   Specifying UTF-8 would be good :-)

-Dave

> 
> > > Unfortunately we probably cannot rely on getnameinfo()/getaddrinfo()
> > > doing the Right Thing.  A Standards-Track RFC on this would probably help.
> >
> > Well API RFCs (like RFC 3493 for getnameinfo/getaddrinfo) are
> > Informational, not Standards-Track.  But yes an RFC would probably help.
> 
> We have plenty of Standards-Track API RFCs.  (Yes, we really do.)  I think it'd be
> entirely appropriate to have a Standards-Track RFC specifying how these two
> functions (or new variants thereof) should handle IDNA.  Indeed, I think it's
> necessary, and a major, perhaps the only serious shortcoming of IDNAbis.
> 
> Nico
> --



More information about the Idna-update mailing list