Unilingua

Sat Sep 17 04:04:16 CEST 2005

Doug, yes please rethink those plans.

In another sphere we have a small number of character encodings, and we
can't get software to properly identify the encoding in play. Why should we
believe that with thousands of language codes available they will be used
properly?

Even with the small number of codes we have today, I have difficulty
determining which code properly describes a document. There are no
guidelines or rules or ways to determine whether a document is one branch of
a language versus another, except with the crudest of guesses. Various
experts make pronouncements about Japanese being ja and not ja-jp, or latn
not being required for en, since en is not generally represented in another
script, but only an expert knows all of the possibilities and which
circumstances never (or nearly never) occur, and which ones require
additional descriptors or not. Given that is the case, I really don't need a
more refined set of language choices.

If and when someone gives me a way to review a document and determine the
proper language tag, and we all agree on the right tag, and it doesn't
require three linguists to do the determination, I'll believe we have a
system worth all these refinements. Oh, and I also need to believe the
distinctions are something that my application may utilize.

I understand that for some very few purposes the ability to distinguish
between thousands of languages is useful. I just don't see that most users,
or most applications need it, and most content providers are incapable of
correctly tagging their content. So I don't see why we should burden general
applications with it.

So what good has it done that we have registered Boontling? For all the web
pages and applications that do something with boontling, was the world
really much better than if we had left them on their own with x-boontling?
Is the world so much better that we registered boontling and denied or
delayed es-americas?

The ISO 639 standards serve their purposes for linguists. The majority of
software on the internet does not require this level of distinction and does
not need to be burdened with it and I don't see that 3066bis will be
deployed the way it has been envisioned. 

tex

Doug Ewell wrote:
> 
> We have registered tags for Boontling, Enochian, Mingo, and Scouse.
> 
> In LTRU we are discussing, very seriously and purposefully, adding
> support for ISO 639-3 in the future, which would add the next 6,700
> languages and 350 "extended languages" that WEREN'T used widely enough
> to justify an ISO 639-2 code element.
> 
> We're also talking, at least peripherally, about supporting ISO 639-6 in
> a still-later version.  That could add as many as 13,000 more codes for
> almost every imaginable dialect and spoken or written variation.
> 
> If registering one more constructed-language tag is going to cause
> problems of scale, we'd better rethink some of those other plans.
> 
> (BTW, it would have to be "x-uniling" or some such, due to length
> constraints.)
> 
> --
> Doug Ewell
> Fullerton, California
> http://users.adelphia.net/~dewell/
> 
> ----- Original Message -----
> From: "Tex Texin" <tex at xencraft.com>
> To: "Doug Ewell" <dewell at adelphia.net>
> Cc: <ietf-languages at iana.org>
> Sent: Friday, September 16, 2005 0:20
> Subject: Re: Unilingua
> 
> > Doug,
> >
> > (I am replying to your mail, but it is not directed at you
> > personally.)
> >
> > Why do we want to register things that have no practical use or
> > significance, for which there are almost no documents to give the tag
> > to, and yet make our software tables larger and require more time to
> > explain what it represents than the value of recognizing the code?
> >
> > Isn't it ok to have some number of documents for which we say, yes the
> > contents are in a language which isn't covered by tags, so if you want
> > a description it needs to be annotated in some other way.
> >
> > If somebody has a unilingua text they can label it with x-unilingua
> > and note somewhere what it represents.
> >
> > We should reel the registry back into being something that internet
> > engineers need for practical internet applications and have some form
> > of 80/20 rule related to language categorization. I recognize the
> > needs of linguists to distinguish languages with subtle but important
> > differences, but I don't see that general software or internet
> > applications should be burdened with the overhead. This has all got to
> > fit in my watch someday. The registry should not be a museum for every
> > possible variant that ever existed or was postulated. Maybe in
> > addition to 50 documents to register a tag we should require there be
> > 50 engineers that testify they care to recognize the distinction.
> > (kidding, but only slightly...)
> >
> > tex

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex at XenCraft.com
Xen Master                          http://www.i18nGuy.com

XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------