Unknown text/* subtypes

Martin Duerst duerst at it.aoyama.ac.jp
Wed Dec 26 02:22:47 CET 2007

At 03:17 07/12/19, Frank Ellermann wrote:
>Julian Reschke wrote:
>> It would be nice if somebody could provide some insight why this ever 
>> made it into HTTP. Was that just an attempt to allow text/html encoded 
>> in latin1 to be served without charset parameter?

Yes, in some ways. The Web was started at CERN in Geneva, and at
that time, iso-8859-1 seemed like a forward-looking choice allowing
to cover not only the US, but also (most of) Western Europe.
The first versions of HTTP (HTTP 0.9 or before) didn't have any
version indication, didn't allow a charset parameter, and also
didn't have any request or response headers. Responses were just
HTML, nothing else. For a short summary,
please see http://www.w3.org/Protocols/HTTP/AsImplemented.html.

HTTP 0.9 was later generalized into HTTP 1.0 and HTTP 1.1 as we
know it. For quite some time, there were a lot of clients out
there that badly choked on charset parameters.

So it wasn't that the default was an attempt to save some bytes
for Latin-1, but that it was in some way necessary to be backwards-
compatible with very early versions not documented as RFCs.

Such backwards compatibility is no longer necessary, fortunately.
The situation currently on the Web is different. The actual
'default' used by browsers isn't simply iso-8859-1, it's whatever
the user has set as his/her preferred encoding, or whatever the
setting of the specific language version is. This means that in
essence, there is NO default. The HTTP spec clearly should be
fixed to say so.

>Some parts of this puzzle:  RFC 2070 introduced an "ideally anything
>is Unicode" concept, later adopted by HTML 4+, XHTML 1+, and XML 1+.
>AFAIK HTML 3.2 and maybe also HTML 3 still didn't have this feature.
>As far as RFC numbers mean something 2070 was published "after" 2068,
>both say January 1997, and "the law" 2277 was clearly a year later.

At least 2070 and 2277 were in the works for a really long time.
That may also apply to 2068.

>RFC 2068 (HTTP/1.1) was the successor of 1945 (HTTP/1.0, May 1996),
>2070 (HTML i18n) was the successor of 1866 (HTML 2, November 1995).
>Tim Berners-Lee, one co-author of RFC 1866 and 1945, wrote in 1866:
>| NOTE - To support non-western writing systems, a larger character
>| repertoire will be specified in a future version of HTML. The
>| document character set will be [ISO-10646], or some subset that
>| agrees with [ISO-10646]; in particular, all numeric character
>| references must use code positions assigned by [ISO-10646].

That was put in because the HTML WG at that time already more
or less understood (after a lot of discussions) that the direction
to go was Unicode/ISO 10646. A lot of the work on HTML 2.0 and
HTML i18n (RFC 2070) and some other pieces was going on somewhat
in parallel.

>Speculation, in May 1996 it made sense that HTTP/1.0 can transport
>HTML 2 "as is", default Latin-1, and it took Harald and Martin some
>months to fix this in RFC 2070 and 2277, too late for RFC 2068, and
>RFC 2616 simply inherited "default Latin-1" wholesale. 

It wasn't just Harald and me. It was a lot more people, in particular
all the coauthors of RFC 2070 and 2277.

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     

More information about the Ietf-types mailing list