Converting non-ASCII to ASCII

CE Whitehead cewcathar at
Mon Jun 25 17:22:04 CEST 2007


I feel that an ideal registry would include the escape sequences (for people 
whose browsers do not disply unicode; some of the browsers I use cannot be 
set to display unicode; sorry to say), the transliterations, and the utf-8 
characters (some of which would appear as rectangles and question marks in 
my browser), but ideally could be formatted in utf-8.

So should I or should I not submit a change to my comments field (which is 
already mixed, eme is ascii transliteration;
while l'académie françoise makes use of escape sequences??

Also be-tarask was all done as ascii transliterations; we argued about which 
transliteration to use (in this case there was some discussion);
do we now need to include the utf-8 characters/escape sequences for it???

Thanks.  (A few more comments below)

--C. E. Whitehead
cewcathar at
>Your Reviewer knows something about transliteration and can propose 
>transliterations by hand. It does not need to be "automatic". None of the 
>rest of the text is generated automatically. Humans type it in.
>>Provençal => Provencal is an easy case, but there is no general
>>rule to do such a conversion from Unicode to ASCII. (And specially no
>>standard rule, for instance, there is no standard way to transliterate
>>Arabic characters to Latin characters: english-speaking people write
>>"Iraq", french-speaking write "Irak" and so on.)
>The reason we need transliteration is to HELP users. To help them type the 
>right thing into the Library of Congress catalogue, for instance, if they 
>want to find more about a bibliographical reference.

>It's pretty ridiculous that UTF-8 isn't permitted, but since it is not, all 
>we need to have is "Provençal (Provencal)". Ugly? It's better than the hex 
I sort of like having the hex escapes too because you can look those up to 
see the characters in pdf in the unicode character charts even when your 
browser settings cannot be changed so as to display all the characters.  I 
don't know how much use a person could make of these though in terms of 
using them to search or whatever.

>I don't see why this should be controversial. Your Iraq/Irak suggestion is 
>a red herring. In the first place, there are International Standards for 
>transliteration. And in the second place, this isn't really complete 
>transliteration, which often makes use of diacritics; it is ASCII fallback,
>Michael Everson *

Hotmail to go? Get your Hotmail, news, sports and much more!

More information about the Ietf-languages mailing list