HTTP and IDN, was RE: Nameprep input vs output

Erik van der Poel erikv at google.com
Thu Jan 11 13:37:04 CET 2007


Hi Michel,

Thanks for the reply!

I personally don't feel that this is seriously off-topic, since many
IDNA implementors will have questions related to this. The way I see
it, the current use (in IDNs in HTML) of Unicodes with compatibility
decompositions (such as full-width w and fl ligature) is there for
historical reasons, and it may be too late to get rid of this usage.

In the IRI RFC, you were forced to acknowledge the existence of legacy
HTTP servers that only accept paths or queries in legacy encodings
like Big5 or iso-8859-1. See the bottom of the 3rd paragraph in
section 6.4 of RFC 3987 (IRI):

http://ietf.org/rfc/rfc3987.txt

A new HTTP RFC or HTML spec may likewise be forced to acknowledge the
existence of user agents that perform some NFKC mappings for IDNs. I
could be wrong, of course, since there are so few IDNs on the Web at
the moment. If we all agree that we need to get rid of these NFKC
mappings and the implementors actually heed our recommendations, we
may be able to stamp these out.

This is why I changed the Subject header to "Nameprep input vs
output". I.e. the Web is currently using Nameprep input in HTTP links
in HTML.

Erik

On 1/11/07, Michel Suignard <michelsu at windows.microsoft.com> wrote:
> >
> >As you know, there are already HTML documents on the Web that include
> >HTTP URIs that use IDNs (and MSIE 7 supports them). I have heard
> >rumors that an HTTP RFC update activity may have started. Do you know
> >whether that is true and whether there is anyone there to discuss the
> >addition of IDNs to the spec for HTTP URIs (or should I say IRIs)?
>
> Hi Erik,
> In HTTP URIs, IDNs should only exist in Punycode notation (but they may also be % encoded). If a 'HTTP URI' contains IDN in native form you are really dealing with IRIs which can be handled as presentation forms of the underlying and equivalent URIs. The IRI RFC was drafted to make easy for user agent to process URI and IRI that way. There is much more details on the IRI RFC (3987). I encourage you to read the text and raise any issues you may find. Martin is also on this list and will also be interested, I am sure.
>
> I have not been following discussion about an HTTP RFC update activity. If native IDN and in general non ASCII characters were added to HTTP, it really relates to the discussion of using IRI as protocol elements for a new scheme (not really HTTP anymore). Keeping the IRI at the presentation layer as of today while still maintaining http as we know it for the core protocol/scheme seems to me prudent, but we are getting seriously OT here.
>
> On another hand, the discussion in this list may have some consequence for IRI, especially concerning bidirectional issue as IRI uses the stringprep bidi restriction almost word by word.
>
> Michel
>


More information about the Idna-update mailing list