IDNA and getnameinfo() and getaddrinfo()
dthaler at microsoft.com
Mon Jun 14 22:20:55 CEST 2010
> -----Original Message-----
> From: Nicolas Williams [mailto:Nicolas.Williams at oracle.com]
> Sent: Monday, June 14, 2010 12:32 PM
> To: Dave Thaler
> Cc: idna-update at alvestrand.no; john+ietf at jck.com; cheshire at apple.com
> Subject: Re: IDNA and getnameinfo() and getaddrinfo()
> On Mon, Jun 14, 2010 at 07:14:12PM +0000, Dave Thaler wrote:
> > > Over in the NFSv4 WG we're discussing how to fix NFSv4.1 to properly
> > > handle IDNA. In the process of doing so I ran into draft-iab-idn-
> > > encoding, which has a cogent discussion of name service switches (pictured
> in figure 2).
> > >
> > > draft-iab-idn-encoding aims for Informational status. I'm wondering
> > > if we could publish a Standards-Track document describing how
> > > getnameinfo() and
> > > getaddrinfo() should handle IDNA.
> > >
> > > For example, one could say that when using DNS getnameinfo() should:
> > Be careful not to confuse getnameinfo() with DNS.
> I explicitly pointed out the name service switch architecture usually
> implemented. I thought that'd suffice to clarify that I really meant "the DNS
> plug-in to the getnameinfo() entry point in the name service switch" -- I just
> didn't want to be too redundant.
> > As noted in draft-iab-idn-encoding and RFC 3493, DNS is just one of a
> > number of mechanisms used under getnameinfo().
> Right, and I believe the failure to acknowledge this in the original IDNA
> architecture was a significant failure. I'm disappointed that though this is being
> acknowledged now, it's not in a standards-track document.
> > > - perform the DNS lookup
> > > - apply ToUnicode() to the resulting domainname
> > > - attempt to convert the address' name to the caller's locale's codeset
> > > if that codeset is not UTF-8
> > > - if failure, then return the A-label as the canonical hostname
> > > - if success return the U-label (in the caller's locale's codeset)
> > > as the canonical hostname and the A-label as an alias
> > >
> > > And that when using DNS getaddrinfo() should:
> > >
> > > - convert the given host/domainname from the caller's locale's codeset
> > > to UTF-8 if necessary
> > > - apply ToASCII(), perform DNS lookups
> > As discussed in draft-iab-idn-encoding section 3, it's not that simple.
> > The ACE form applies in the public DNS but does not apply in many
> > private DNS clouds.
> I'm not sure I care about those, but one could always implement lists of domains
> below which to apply alternative algorithms.
You may not care about them but unfortunately people who provide
getaddrinfo/getnameinfo libraries for applications in general need to
care about them.
> I was specifically interested in what name should be returned as canonical and
> what name should be returned as an alias, if any.
> > > - if success, return the IP address(es) found, the given name as the
> > > canonical hostname, the A-label form of the hostname as an alias,
> > > and the U-label form (converted to the caller's locale's codeset)
> > > as an alias if different from the given hostname.
> > The addrinfo structure returned by getaddrinfo() does not return
> > "aliases" per se. It can return a single string which is:
> > char *ai_canonname; /* canonical name for nodename */
> Oh, right. How depressing. I'd for some reason thought them similar enough to
Unfortunately they're not.
> So remove all references to aliases from my previous post; instead these
> functions should return the A-label as the canon name only when the U-label
> cannot be converted to the caller's locale's codeset losslessly, else they should
> return the U-label (in the caller's locale's codeset) as the canon name.
I'd argue that the "canon name" should be the form in which it was
resolved over the wire. So the A-label form if it was resolved in the public DNS,
and another form (typically the U-label form) if it was resolved via something
else (e.g., mDNS or DNS in a private namespace using UTF-8 or whatever else).
Also note that Windows treats "char *" as ANSI (which has no guarantee of
interoperability) and hence deprecates getaddrinfo/getnameinfo, and defines
UTF-16 versions (GetAddrInfoW/GetNameInfoW).
MacOS on the other hand treats "char *" as UTF-8.
RFC 3493 doesn't say either way whether "char *" is ANSI or UTF-8 or whatever
else, and as far as I know, neither does POSIX
Hence this is an issue for anyone proposing to make a standards-track RFC for
> > In my view, yes you're on the right track in having NFSv4 not want to
> > do encoding conversion itself for name resolution but in expecting it
> > to be done under getaddrinfo/getnameinfo.
> Would more advice to protocol designers be appropriate then? When should
> application protocols (ingoring domainname registration related
> protocols) care to specify A-labels-only, U-labels-only, both, or un-pre-
> processed Unicode?
> If we could assume IDNA-aware getnameinfo()/getaddrinfo() then is there any
> reason for application protocols [that don't involve domainname registration] to
> do anything other than allow all three forms (A-label, U-label and un-pre-
> processed Unicode) on the wire?
I'd argue any new application protocol ought to specify the encoding rather than
allowing multiple. Specifying UTF-8 would be good :-)
> > > Unfortunately we probably cannot rely on getnameinfo()/getaddrinfo()
> > > doing the Right Thing. A Standards-Track RFC on this would probably help.
> > Well API RFCs (like RFC 3493 for getnameinfo/getaddrinfo) are
> > Informational, not Standards-Track. But yes an RFC would probably help.
> We have plenty of Standards-Track API RFCs. (Yes, we really do.) I think it'd be
> entirely appropriate to have a Standards-Track RFC specifying how these two
> functions (or new variants thereof) should handle IDNA. Indeed, I think it's
> necessary, and a major, perhaps the only serious shortcoming of IDNAbis.
More information about the Idna-update