[Ltru] Re: Solving the UTF-8 problem

Wed Jul 4 08:43:59 CEST 2007

At 00:24 07/07/04, Peter Constable wrote:
>From: Doug Ewell [mailto:dewell at roadrunner.com]
>
>> I restated three objections to converting the Registry to
>> UTF-8, and tried to show why they don't outweigh the
>> advantages of converting.

Dougs argument that most newcommers are confused by the
numeric character references is a very strong one.

>>  All three are, in fact, true:
>>
>> 1.  UTF-8 doesn't play well with e-mail.
>> 2.  Converting will break processors that expect only ASCII.
>> 3.  Some computers can't display UTF-8.
>>
>> But we can work out the e-mail problem
>
>+1

I'm confident this can be done. I'm one of the people who
cannot view UTF-8 in email, but I consider that my problem,
not a problem of the WG or the subtag registration mailing
list.

One thing we should try to get solved (if it's not already
done) is to make sure that the mailing list archive serves
emails with the correct charset setting. This may or may not
already the case.

>> And the display problem is really not as much of a
>> showstopper as it is being portrayed.  People are saying
>> that the hex escapes are a display problem too...
>
>+1. I don't see the display issue as being a show-stopper at all. Anybody 
>that has a need to view this registry has access to means of viewing UTF-8.

I strongly agree with this.

>> the breakage to processors is no worse than adding new
>> fields (nor are there that many fully-conformant
>> processors to be fixed).
>
>I'm inclined to agree, but am waiting to see if anyone makes a strong 
>counterargument.

I agree here too. There are not too many implementations that
read in the registry, and of these, some are known and can be fixed,
some are know to be 8-bit tolerant, and some are run only in batch
mode in a central place and can be fixed when an update occurs.

For the implementations where this really matters, i.e. stuff that
is field-deployed with a software upgrade mechanism and polls the
registry, first, such implementations should be rather rare, and
second, they should have been implemented in a robust way, because
with the network, there are no guarantees at all. Explained in another
way, if the implementation throws up because it sees an eigth bit
on a byte, and becomes completely useless (e.g. it clears its
internal language information cache or just blows up), then that's
a very bad implementaion. Even if we keep all our stability guarantees,
there is no guarantee that the network will never turn any bits.

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp