HTTP and IDN, was RE: Nameprep input vs output

Erik van der Poel erikv at google.com
Mon Jan 15 15:35:08 CET 2007


On 1/13/07, Martin Duerst <duerst at it.aoyama.ac.jp> wrote:
> I made some leaps. Michel wrote:
>
> >>>>
> I have not been following discussion about an HTTP RFC update activity. If native IDN and in general non ASCII characters were added to HTTP, it really relates to the discussion of using IRI as protocol elements for a new scheme (not really HTTP anymore).
> >>>>
>
> So I was changing 'new scheme' to 'new protocol' (not intended),
> but I was also changing 'non ASCII characters' to 'raw octets',
> which was intended, because HTTP basically is working with
> octets, not characters, for things such as path and query part.
> Of course, an update could say that these have to use %-encoding
> when they are not UTF-8, but can use raw octets when they are
> UTF-8.

Yes, that was quite a leap, but very interesting.

> >Yes, these are accidents/garbage. The problem is that MSIE 7, Firefox
> >and the Verisign i-Nav plugin for MSIE 6 all accept this garbage. As
> >you know from the history of HTML, when user agents are too liberal in
> >what they accept, garbage can become entrenched and difficult to
> >remove.
>
> Yes. So let's try and move forward with our work.

Yes, we should continue to work on the Internet Drafts, but do you
think we should reach out to the browser developers as well, to see if
they might consider changing their implementations now (just to reject
the non-NFKC characters)?

> >Windows-1258 appears not to be so common on the Web,
>
> Interesting, and glad to hear that. Do you have statistics?

Windows-1258 appeared as an HTML META charset label in 0.00078% of
documents that have such a label around mid-2006. In 2001, it was
0.0012%. Windows-1258 was not in the top 100 charsets in 2001 and
2006.

Erik


More information about the Idna-update mailing list