IDNA and getnameinfo() and getaddrinfo()
Nicolas.Williams at oracle.com
Tue Jun 15 01:42:19 CEST 2010
On Mon, Jun 14, 2010 at 08:20:55PM +0000, Dave Thaler wrote:
> > > As discussed in draft-iab-idn-encoding section 3, it's not that simple.
> > > The ACE form applies in the public DNS but does not apply in many
> > > private DNS clouds.
> > I'm not sure I care about those, but one could always implement lists of domains
> > below which to apply alternative algorithms.
> You may not care about them but unfortunately people who provide
> getaddrinfo/getnameinfo libraries for applications in general need to
> care about them.
For the matter of this discussion, I don't care. If I were implementing
I'd consider providing a local administrative configuration interface by
which to provide lists of private cloud domains that use alternative IDN
schemes. (Actually, I'd probably want a distributed configuration
method for that, preferably using DNS itself, but really, that's a
tangent I don't want to go on because it's a distraction from the
purpose of this thread.)
> > So remove all references to aliases from my previous post; instead these
> > functions should return the A-label as the canon name only when the U-label
> > cannot be converted to the caller's locale's codeset losslessly, else they should
> > return the U-label (in the caller's locale's codeset) as the canon name.
> I'd argue that the "canon name" should be the form in which it was
> resolved over the wire. So the A-label form if it was resolved in the public DNS,
> and another form (typically the U-label form) if it was resolved via something
> else (e.g., mDNS or DNS in a private namespace using UTF-8 or whatever else).
> Also note that Windows treats "char *" as ANSI (which has no guarantee of
> interoperability) and hence deprecates getaddrinfo/getnameinfo, and defines
> UTF-16 versions (GetAddrInfoW/GetNameInfoW).
> MacOS on the other hand treats "char *" as UTF-8.
Better yet, Simon's poposal allows the caller to decide which name
should be returned as canonical. That works for me.
> RFC 3493 doesn't say either way whether "char *" is ANSI or UTF-8 or whatever
> else, and as far as I know, neither does POSIX
See Simon's reply.
> Hence this is an issue for anyone proposing to make a standards-track RFC for
I'd be willing to specify new functions with different names if need be.
But it seems me that the between getaddrinfo()'s hints and
getnameinfo()'s flags arguments we have enough room for extensibility
without resorting to new function names.
> > If we could assume IDNA-aware getnameinfo()/getaddrinfo() then is there any
> > reason for application protocols [that don't involve domainname registration] to
> > do anything other than allow all three forms (A-label, U-label and un-pre-
> > processed Unicode) on the wire?
> I'd argue any new application protocol ought to specify the encoding rather than
> allowing multiple. Specifying UTF-8 would be good :-)
Just UTF-8, un-pre-processed, raw user input? Or did you mean U-labels?
Also, with respect to deployed protocols that have protocol elements for
carrying domainnames, where those protocol elements are defined as
carrying UTF-8, but where in practice most implementors did not actually
code those slots as IDN-aware, wouldn't it be a strong presumption that
the slots are IDN-unaware?
More information about the Idna-update