Solving the UTF-8 problem

Mon Jul 2 22:15:55 CEST 2007

On Sun, Jul 01, 2007 at 03:58:48PM -0700,
 Doug Ewell <dewell at roadrunner.com> wrote 
 a message of 161 lines which said:

> 3.  UTF-8 can't be read on some, espcially older, computer systems
> (Frank Ellermann, months ago, and CE Whitehead).
> 
> With the continuing adoption of Unicode by OS and software vendors,
> I really can't get behind this argument.

Sorry but UTF-8 adoption is far from ubiquitous. Many tools still have
problems with UTF-8. I discovered today that ht://Dig, one of the two
most common free search engines has no UTF-8 support at all (see
http://www.htdig.org/FAQ.html#q4.27 and
http://www.htdig.org/FAQ.html#q4.10) which is quite sad for a Web
search engine (and, yes, the explanations they give are wrong, too).

Another common example is the Postscript tool a2ps.

> It simply isn't appropriate to "dumb down" all computerized text to
> match the least capable systems that might be running somewhere.

I understand the reasoning and, yes, switching the registry to UTF-8
might be one more signal sent to software developers, to tell them
they really should upgrade but do not claim that everything is done
yet.

So, I basically agree that UTF-8 for the registry is better but I do
not want to see bold sentences like "Anyone but Frank Ellermann can
run a full UTF-8 environment by now". This is not true.

> This is especially true considering the language names listed above.
> We don't restrict text to uppercase to maintain compatibility with
> BCDIC and Sinclair ZX81 systems.

I'm not talking about dead systems but about programs which are live,
used and maintained.

Note from the trenches: as an implementor, I promise to follow
whatever LTRU will decide and to improve my UTF-8 parsing abilities in
Haskell, should we decide to use it.