Solving the UTF-8 problem
bortzmeyer at nic.fr
Mon Jul 2 22:15:55 CEST 2007
On Sun, Jul 01, 2007 at 03:58:48PM -0700,
Doug Ewell <dewell at roadrunner.com> wrote
a message of 161 lines which said:
> 3. UTF-8 can't be read on some, espcially older, computer systems
> (Frank Ellermann, months ago, and CE Whitehead).
> With the continuing adoption of Unicode by OS and software vendors,
> I really can't get behind this argument.
Sorry but UTF-8 adoption is far from ubiquitous. Many tools still have
problems with UTF-8. I discovered today that ht://Dig, one of the two
most common free search engines has no UTF-8 support at all (see
http://www.htdig.org/FAQ.html#q4.10) which is quite sad for a Web
search engine (and, yes, the explanations they give are wrong, too).
Another common example is the Postscript tool a2ps.
> It simply isn't appropriate to "dumb down" all computerized text to
> match the least capable systems that might be running somewhere.
I understand the reasoning and, yes, switching the registry to UTF-8
might be one more signal sent to software developers, to tell them
they really should upgrade but do not claim that everything is done
So, I basically agree that UTF-8 for the registry is better but I do
not want to see bold sentences like "Anyone but Frank Ellermann can
run a full UTF-8 environment by now". This is not true.
> This is especially true considering the language names listed above.
> We don't restrict text to uppercase to maintain compatibility with
> BCDIC and Sinclair ZX81 systems.
I'm not talking about dead systems but about programs which are live,
used and maintained.
Note from the trenches: as an implementor, I promise to follow
whatever LTRU will decide and to improve my UTF-8 parsing abilities in
Haskell, should we decide to use it.
More information about the Ietf-languages