Solving the UTF-8 problem
Doug Ewell
dewell at roadrunner.com
Mon Jul 16 16:36:13 CEST 2007
Stephane Bortzmeyer <bortzmeyer at nic dot fr> wrote:
>> I suggest changing the naming convention for UTF-8 files on
>> langtag.net from "something.utf8" or "something.txt.utf8" to
>> "something.utf8.txt" to take advantage of this.
>
> I hesitate here because the www.langtag.net Web site sends the proper
> file type:
>
> Content-Type: text/plain; charset=utf-8
The HTTP wasn't at issue here; the filename extension was.
> If IE does not understand that the file is plain text encoded in
> UTF-8, it is broken.
>
> Using the file extension to find its type is both non-standard (why
> ".txt" instead of ".text"? Where is the registry of file extensions?)
> and quite old-fashioned.
Broken or not, archaic and quaint or not, this is one of the way
operating systems have to tell what a file is. Some systems store this
information in a separate meta-database, which wouldn't be of much use
for a hitherto-unknown file pulled off the Internet. Classic Unix
systems might read the first few bytes and try to figure it out from
there. Windows systems happen to use the extension.
It's easy enough for a knowledgeable user to tell Windows that "utf8" is
an extension for plain-text files by editing the Windows registry
(answering your question about where the registry of extensions is). As
for "txt", this is indeed the de-facto standard for text files under
Windows (and not "text"), going back to the MS-DOS days of 3-letter
extensions. We can try to change this convention, but I submit that is
not our job here on ietf-languages.
Jeremy Carroll <jjc at hpl dot hp dot com> wrote:
> My copy of IE seems happy enough
>
> Version 7.0.5730.11CO
>
> on
>
> OS Name Microsoft Windows XP Professional
> Version 5.1.2600 Service Pack 2 Build 2600
Yes, my platform at ome is basically the same, and it works fine there.
> I would suggest listing a few known-to-work configurations, somewhere,
> ... and as long as the behaviour is standards conformant, and does
> work on enough platforms, then that's fit for purpose.
"txt" would work on both old and new systems.
> This is not a site intended for the general public requiring some
> solution for old, old browsers. If it were, they would be a server
> side solution by looking at the user agent information passed with the
> http request, and returning say the ascii only version or redirecting
> to a URL with a .txt extension or whatever. But that is not what this
> page is for!
I invite Stephane and Jeremy to hash this out with CE Whitehead, who
says the Registry needs to be compatible with any system that might be
used "anywhere on earth." I've suggested a minor file naming change
that would allow him to read UTF-8 on a Windows 95 machine, and would
not break on newer equipment. Come on, guys, this really does not need
to become an OS war.
--
Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages
More information about the Ietf-languages
mailing list