Impact of Punycode

Adam M. Costello idna-update.amc+0+ at nicemice.net.RemoveThisWord
Fri Mar 26 19:57:08 CET 2010


Shawn Steele <Shawn.Steele at microsoft.com> wrote:

> For example, if you make an http request, the xn-- name can get into
> the http request.

Not just "can", but must.  The HTTP request is an IDN-unaware slot, and
therefore only ASCII domain names are allowed.

> Certainly the IDNAxxxx docs say nothing about http requests.

Not about HTTP per se, but it's covered by the rules about IDN-aware and
IDN-unaware slots.

> What's a web server to do if it gets a UTF-8 request?

400 Bad Request would be a correct response.  The HTTP spec says that
HTTP accepts URIs, which are ASCII.  It makes no mention of IRIs (which
didn't exist when the spec was written).  Furthermore, the header
syntax uses only ASCII, except for non-machine-readable TEXT, which is
iso-8859-1 (not utf-8!).  Assuming that 8-bit data in the Host: field
is utf-8 would be a departure from the spec.  But I suppose trying to
recover from bad input using heuristics might be reasonable, and the
server could attempt to perform the IDNA conversion that the client
should have done.

> A Punycode request?

If the server is completely IDN-unaware, then it just treats this as a
regular URI, and it just works, but the config files and log files are
also in ASCII form, which is inconvenient for the human administrator.
If the server is IDN-aware, it should probably choose a canonical
internal form for matching requests with resources, and translate
to/from that canonical form when read/writing config files, log files,
and requests.

AMC


More information about the Idna-update mailing list