Converting non-ASCII to ASCII

Mon Jun 25 09:24:00 CEST 2007

At 09:12 +0200 2007-06-25, Stephane Bortzmeyer wrote:

>What may be on-topic for the ietf-languages list is to remind people
>that there is *no* way to do *automatic* transliteration to ASCII in
>most cases.

Your Reviewer knows something about transliteration and can propose 
transliterations by hand. It does not need to be "automatic". None of 
the rest of the text is generated automatically. Humans type it in.

>Proven&#xE7;al => Provencal is an easy case, but there is no general
>rule to do such a conversion from Unicode to ASCII. (And specially no
>standard rule, for instance, there is no standard way to transliterate
>Arabic characters to Latin characters: english-speaking people write
>"Iraq", french-speaking write "Irak" and so on.)

The reason we need transliteration is to HELP users. To help them 
type the right thing into the Library of Congress catalogue, for 
instance, if they want to find more about a bibliographical reference.

It's pretty ridiculous that UTF-8 isn't permitted, but since it is 
not, all we need to have is "Proven&#xE7;al (Provencal)". Ugly? It's 
better than the hex escape.

I don't see why this should be controversial. Your Iraq/Irak 
suggestion is a red herring. In the first place, there are 
International Standards for transliteration. And in the second place, 
this isn't really complete transliteration, which often makes use of 
diacritics; it is ASCII fallback,
-- 
Michael Everson * http://www.evertype.com