Converting non-ASCII to ASCII

Mon Jun 25 02:27:38 CEST 2007

CE Whitehead <cewcathar at hotmail dot com> wrote:

> Here are the remaining subtags (from
> http://www.iana.org/assignments/language-subtag-registry)--
> where there are escape sequences in
> the description

It's really not necessary to go through this exercise on the list.  Most of 
us are capable of searching a text file for the character '&' and able to 
determine that the ASCII equivalent for "o-with-circumflex" is "plain o".

A less trivial question is how we are going to resolve the ongoing issue of 
"preserve non-ASCII characters" versus "hex NCRs are not human-friendly" 
versus "UTF-8 doesn't make it through some systems without being corrupted." 
There is a discussion being held in the LTRU Working Group, for at least the 
third time, over changing the format of the Registry to UTF-8.  That is the 
proper place to hold that discussion, not this list.

The question of providing pure-ASCII transliterations for every string in 
the Registry that includes a hex NCR, even something like Proven&#xE7;al, is 
an operational detail, and does belong on this list IMHO.  My opinion is 
that we need to be able to represent non-ASCII in the Registry by some 
means, either hex NCRs or UTF-8 or something, and that it's not necessary or 
feasible to come up with a pure-ASCII transliteration for everything.  But 
again, this is the place to discuss additions and changes to the Registry 
contents, and LTRU is the place to discuss changing the rules.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages