Solving the UTF-8 problem; was Language Tag Modification 1694acad;

Mon Jul 2 17:03:27 CEST 2007

Hi, I'm confused as to whether or not persons with Thai Windows only (do 
they exist?? I thought so) can see all Latin-1 characters properly.

Also, I think adding a comment to transliterations into ascii that overlap 
would not be too much trouble.

(But this is needed only if there are persons who have operating systems 
anywhere on earth who cannot see these characters).

Finally, I do not think it is too much work to provide a ferw 
transliterations of non-ascii characters, given all the other stuff that 
goes into language subtag entries.

And I do not mind whether it is the officialregistry in ascii and the 
unofficial in utf-8 or the other way around.

Thanks.

(I've put all this again below for people who like the context.)

--C. E. Whitehead

Doug Ewell dewell at roadrunner.com
Mon Jul 2 00:58:48 CEST 2007

>What are we going to do when the ISO 639-3 code list is finalized and we 
>have to deal with adding the following pairs of languages, whose names 
>differ only by diacritical marks?

>aru  Arua
>arx  Aruá

>bfa  Bari
>mot  Barí

>kgm  Karipúna
>kuq  Karipuná

>sbe  Saliba
>slc  Sáliba

>wbf  Wara
>tci  Wára

>Are we going to include an ASCII version of every name that contains an 
>accented letter?  There are several hundred in ISO 639-3.

I have not seen ISO 639-3; but as it so much work  just to put in the 
comments, description, etc., that it should be trivial to add in the ascii 
name!
As for the languages that differ only by the diacritical mark you might need 
a comment about this
somewhere; I think it can be handled!  (for non-latin charcters and for 
Latin if needed, eg. if there are persons in the world who cannot see these)

>Section 3.1 mentions transcription of non-Latin Description fields into the 
>Latin script.  It does not talk about providing a pure-ASCII equivalent for 
>every non-ASCII French- or Spanish-language string, and I don't believe 
>that was the WG's intention.  Transcriptions are useful when the content is 
>in Arabic or Cyrillic or Han, to make the material available to 
>Latin-script-only readers.  Providing "transcriptions" like "4#xE8;eme 
>(4eme)" merely announces to the world that we can't solve our own technical 
>character-encoding problems without resorting to unwieldy kludges.

Are there people who only have say Thai windows, who would appreciate the 
transcriptions?

That's what I was thinking; sorry if I'm wrong.

Stephane Bortzmeyer bortzmeyer at nic.fr
Mon Jul 2 15:55:33 CEST 2007 wrote:

>On Sun, Jul 01, 2007 at 03:58:48PM -0700,
>Doug Ewell <dewell at roadrunner.com> wrote a message of 161 lines which 
>said:

> > Another possibility is to have IANA post an official version of the
> > Registry in one encoding, such as UTF-8, and additional, unofficial
> > versions in other encodings, such as Latin-1 or hex NCRs.
>Why not? Currently, we do exactly the opposite: IANA publishes the
>official registry in hex NCR
>(http://www.iana.org/assignments/language-subtag-registry) and
>langtag.net publishes an unofficial version in UTF-8
>(http://www.langtag.net/registries/language-subtag-registry.utf8).

Fine with me!

--C. E. Whitehead
cewcathar at hotmail.com

_________________________________________________________________
Picture this – share your photos and you could win big!  
http://www.GETREALPhotoContest.com?ocid=TXT_TAGHM&loc=us