[Ltru] Re: Solving the UTF-8 problem

Tue Jul 10 08:38:10 CEST 2007

Doug Ewell wrote on 7/2/07 23:00 -0700:

> Stephane Bortzmeyer <bortzmeyer at nic dot fr> wrote:
>
>>> 3.  UTF-8 can't be read on some, espcially older, computer systems (Frank
>>> Ellermann, months ago, and CE Whitehead).
>>
>> So, I basically agree that UTF-8 for the registry is better but I do not
>> want to see bold sentences like "Anyone but Frank Ellermann can run a full
>> UTF-8 environment by now". This is not true.
>
> You're correct.  I restated three objections to converting the Registry to
> UTF-8, and tried to show why they don't outweigh the advantages of
> converting.  All three are, in fact, true:
>
> 1.  UTF-8 doesn't play well with e-mail.
> 2.  Converting will break processors that expect only ASCII.
> 3.  Some computers can't display UTF-8.
>
> But we can work out the e-mail problem, and the breakage to processors is no
> worse than adding new fields (nor are there that many fully-conformant
> processors to be fixed).  And the display problem is really not as much of a
> showstopper as it is being portrayed.  People are saying that the hex escapes
> are a display problem too, and adding "Arua" and "Aru&#xE1; (Arua)" to the
> Registry is going to confuse a LOT of people, no matter how many comments we
> add.

UTF-8 has been the recommend charset for Internet interchange since RFC 2277. 
Our past experience with ASCII encodings of non-ASCII text in the IETF has been 
questionable.  RFC 2047, 2231, IMAP modified-UTF-7, and quoted-printable have 
all had mixed results.  Meanwhile, UTF-8 based IETF protocols have been less 
problematic from an interoperability viewpoint.  The EAI WG is putting together 
an experiment to try UTF-8 in email headers and addresses and that will 
increase the pressure to update email infrastructure.

Rough edges are inevitable during the adoption of new technology, but where do 
we want to be 5-10 years from now?  What's the least painful path to get there?

                - Chris