Converting non-ASCII to ASCII
Michael Everson
everson at evertype.com
Mon Jun 25 09:24:00 CEST 2007
At 09:12 +0200 2007-06-25, Stephane Bortzmeyer wrote:
>What may be on-topic for the ietf-languages list is to remind people
>that there is *no* way to do *automatic* transliteration to ASCII in
>most cases.
Your Reviewer knows something about transliteration and can propose
transliterations by hand. It does not need to be "automatic". None of
the rest of the text is generated automatically. Humans type it in.
>Provençal => Provencal is an easy case, but there is no general
>rule to do such a conversion from Unicode to ASCII. (And specially no
>standard rule, for instance, there is no standard way to transliterate
>Arabic characters to Latin characters: english-speaking people write
>"Iraq", french-speaking write "Irak" and so on.)
The reason we need transliteration is to HELP users. To help them
type the right thing into the Library of Congress catalogue, for
instance, if they want to find more about a bibliographical reference.
It's pretty ridiculous that UTF-8 isn't permitted, but since it is
not, all we need to have is "Provençal (Provencal)". Ugly? It's
better than the hex escape.
I don't see why this should be controversial. Your Iraq/Irak
suggestion is a red herring. In the first place, there are
International Standards for transliteration. And in the second place,
this isn't really complete transliteration, which often makes use of
diacritics; it is ASCII fallback,
--
Michael Everson * http://www.evertype.com
More information about the Ietf-languages
mailing list