Converting non-ASCII to ASCII

Doug Ewell dewell at
Mon Jun 25 02:27:38 CEST 2007

CE Whitehead <cewcathar at hotmail dot com> wrote:

> Here are the remaining subtags (from
> where there are escape sequences in
> the description

It's really not necessary to go through this exercise on the list.  Most of 
us are capable of searching a text file for the character '&' and able to 
determine that the ASCII equivalent for "o-with-circumflex" is "plain o".

A less trivial question is how we are going to resolve the ongoing issue of 
"preserve non-ASCII characters" versus "hex NCRs are not human-friendly" 
versus "UTF-8 doesn't make it through some systems without being corrupted." 
There is a discussion being held in the LTRU Working Group, for at least the 
third time, over changing the format of the Registry to UTF-8.  That is the 
proper place to hold that discussion, not this list.

The question of providing pure-ASCII transliterations for every string in 
the Registry that includes a hex NCR, even something like Proven&#xE7;al, is 
an operational detail, and does belong on this list IMHO.  My opinion is 
that we need to be able to represent non-ASCII in the Registry by some 
means, either hex NCRs or UTF-8 or something, and that it's not necessary or 
feasible to come up with a pure-ASCII transliteration for everything.  But 
again, this is the place to discuss additions and changes to the Registry 
contents, and LTRU is the place to discuss changing the rules.

Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14

More information about the Ietf-languages mailing list