Solving the UTF-8 problem; was Language Tag Modification 1694acad;

Tue Jul 3 07:43:39 CEST 2007

CE Whitehead <cewcathar at hotmail dot com> wrote:

> Hi, I'm confused as to whether or not persons with Thai Windows only (do 
> they exist?? I thought so) can see all Latin-1 characters properly.

Not if they are unable to run either IE 4.0 or SC UniPad, either of which is 
supposed to work on any Windows 95 Rev B or later system.  If you tell me 
what OS and browser you have at your disposal, I will tell you whether 
should be able to view 
http://langtag.net/registries/language-subtag-registry.utf8 correctly, 
either online or offline.  I could also provide you with a Windows 
95-compatible program that would provide a best-fit display of non-Latin-1 
characters, but I suppose the library won't let you run it.

Remember also that the Registry exists to provide names and descriptions 
that are sufficient for tag producers and consumers to make sensible tagging 
decisions.  Does anyone disagree with that?  It is not just for casual 
browsing.

> Also, I think adding a comment to transliterations into ascii that overlap 
> would not be too much trouble.

Nobody has ever said it would be difficult to add layers of comments.  It 
would be troublesome, at least for me, to introduce unnecessary confusion by 
folding names like Arua and Aruá together and then adding comments to try to 
explain what we meant, instead of just spelling the names right.

> (But this is needed only if there are persons who have operating systems 
> anywhere on earth who cannot see these characters).

Anywhere on earth?  This is your operational criterion, not mine.

> Finally, I do not think it is too much work to provide a ferw 
> transliterations of non-ascii characters, given all the other stuff that 
> goes into language subtag entries.

Nobody has ever said it would be "too much work" to add transliterations. 
See above.

> I have not seen ISO 639-3;

Then trust the examples I gave.

> but as it so much work  just to put in the comments, description, etc., 
> that it should be trivial to add in the ascii name!

See above.  Quick question: How many people feel my efforts on RFC 4646 have 
been guided by what is easiest, rather than what is right?

Quoting Stephane Bortzmeyer <bortzmeyer at nic dot fr>:

>> Why not? Currently, we do exactly the opposite: IANA publishes the 
>> official registry in hex NCR 
>> (http://www.iana.org/assignments/language-subtag-registry) and 
>> langtag.net publishes an unofficial version in UTF-8 
>> (http://www.langtag.net/registries/language-subtag-registry.utf8).
>
> Fine with me!

Can we pursue this official version/unofficial version strategy, as an 
alternative to loading up the Registry with excise?

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages