Solving the UTF-8 problem

Mon Jul 2 15:55:33 CEST 2007

On Sun, Jul 01, 2007 at 03:58:48PM -0700,
 Doug Ewell <dewell at roadrunner.com> wrote 
 a message of 161 lines which said:

> Another possibility is to have IANA post an official version of the
> Registry in one encoding, such as UTF-8, and additional, unofficial
> versions in other encodings, such as Latin-1 or hex NCRs.

Why not? Currently, we do exactly the opposite: IANA publishes the
official registry in hex NCR
(http://www.iana.org/assignments/language-subtag-registry) and
langtag.net publishes an unofficial version in UTF-8
(http://www.langtag.net/registries/language-subtag-registry.utf8).

> Potential problems with this approach are unintentional mismatches
> between the versions (I caught one of these problems for the ISO
> 639-3 people recently)

I do not get it. If the unofficial version is produced by a program,
how can a mismatch exist (unless there is a bug in the program)? 

And if the unofficial version is done by hand, should we tell ISO
639-3 that computers are better than people for boring and repetitive
tasks?