Unilingua

Tex Texin tex at xencraft.com
Tue Sep 20 03:15:35 CEST 2005


Hi Debbie,

That's good to know, can you say more about this?
How will it recognize the correct language for the document? What kind of
criteria does it use?
tex

Debbie Garside wrote:
> 
> Tex wrote:
> 
> > If and when someone gives me a way to review a document and determine the
> > proper language tag, and we all agree on the right tag, and it doesn't
> > require three linguists to do the determination, I'll believe we have a
> > system worth all these refinements.
> 
> The software is currently under development; a tool that can determine the
> language used within a document.
> 
> Best regards
> 
> Debbie Garside
> CEO
> Linguasphere ICT
> 
> > -----Original Message-----
> > From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> > bounces at alvestrand.no] On Behalf Of Tex Texin
> > Sent: 17 September 2005 03:04
> > To: Doug Ewell
> > Cc: ietf-languages at iana.org
> > Subject: Re: Unilingua
> >
> > Doug, yes please rethink those plans.
> >
> > In another sphere we have a small number of character encodings, and we
> > can't get software to properly identify the encoding in play. Why should
> > we
> > believe that with thousands of language codes available they will be used
> > properly?
> >
> > Even with the small number of codes we have today, I have difficulty
> > determining which code properly describes a document. There are no
> > guidelines or rules or ways to determine whether a document is one branch
> > of
> > a language versus another, except with the crudest of guesses. Various
> > experts make pronouncements about Japanese being ja and not ja-jp, or latn
> > not being required for en, since en is not generally represented in
> > another
> > script, but only an expert knows all of the possibilities and which
> > circumstances never (or nearly never) occur, and which ones require
> > additional descriptors or not. Given that is the case, I really don't need
> > a
> > more refined set of language choices.
> >
> > If and when someone gives me a way to review a document and determine the
> > proper language tag, and we all agree on the right tag, and it doesn't
> > require three linguists to do the determination, I'll believe we have a
> > system worth all these refinements. Oh, and I also need to believe the
> > distinctions are something that my application may utilize.
> >
> > I understand that for some very few purposes the ability to distinguish
> > between thousands of languages is useful. I just don't see that most
> > users,
> > or most applications need it, and most content providers are incapable of
> > correctly tagging their content. So I don't see why we should burden
> > general
> > applications with it.
> >
> > So what good has it done that we have registered Boontling? For all the
> > web
> > pages and applications that do something with boontling, was the world
> > really much better than if we had left them on their own with x-boontling?
> > Is the world so much better that we registered boontling and denied or
> > delayed es-americas?
> >
> > The ISO 639 standards serve their purposes for linguists. The majority of
> > software on the internet does not require this level of distinction and
> > does
> > not need to be burdened with it and I don't see that 3066bis will be
> > deployed the way it has been envisioned.
> >
> > tex
> >
> > Doug Ewell wrote:
> > >
> > > We have registered tags for Boontling, Enochian, Mingo, and Scouse.
> > >
> > > In LTRU we are discussing, very seriously and purposefully, adding
> > > support for ISO 639-3 in the future, which would add the next 6,700
> > > languages and 350 "extended languages" that WEREN'T used widely enough
> > > to justify an ISO 639-2 code element.
> > >
> > > We're also talking, at least peripherally, about supporting ISO 639-6 in
> > > a still-later version.  That could add as many as 13,000 more codes for
> > > almost every imaginable dialect and spoken or written variation.
> > >
> > > If registering one more constructed-language tag is going to cause
> > > problems of scale, we'd better rethink some of those other plans.
> > >
> > > (BTW, it would have to be "x-uniling" or some such, due to length
> > > constraints.)
> > >
> > > --
> > > Doug Ewell
> > > Fullerton, California
> > > http://users.adelphia.net/~dewell/
> > >
> > > ----- Original Message -----
> > > From: "Tex Texin" <tex at xencraft.com>
> > > To: "Doug Ewell" <dewell at adelphia.net>
> > > Cc: <ietf-languages at iana.org>
> > > Sent: Friday, September 16, 2005 0:20
> > > Subject: Re: Unilingua
> > >
> > > > Doug,
> > > >
> > > > (I am replying to your mail, but it is not directed at you
> > > > personally.)
> > > >
> > > > Why do we want to register things that have no practical use or
> > > > significance, for which there are almost no documents to give the tag
> > > > to, and yet make our software tables larger and require more time to
> > > > explain what it represents than the value of recognizing the code?
> > > >
> > > > Isn't it ok to have some number of documents for which we say, yes the
> > > > contents are in a language which isn't covered by tags, so if you want
> > > > a description it needs to be annotated in some other way.
> > > >
> > > > If somebody has a unilingua text they can label it with x-unilingua
> > > > and note somewhere what it represents.
> > > >
> > > > We should reel the registry back into being something that internet
> > > > engineers need for practical internet applications and have some form
> > > > of 80/20 rule related to language categorization. I recognize the
> > > > needs of linguists to distinguish languages with subtle but important
> > > > differences, but I don't see that general software or internet
> > > > applications should be burdened with the overhead. This has all got to
> > > > fit in my watch someday. The registry should not be a museum for every
> > > > possible variant that ever existed or was postulated. Maybe in
> > > > addition to 50 documents to register a tag we should require there be
> > > > 50 engineers that testify they care to recognize the distinction.
> > > > (kidding, but only slightly...)
> > > >
> > > > tex
> >
> > --
> > -------------------------------------------------------------
> > Tex Texin   cell: +1 781 789 1898   mailto:Tex at XenCraft.com
> > Xen Master                          http://www.i18nGuy.com
> >
> > XenCraft                          http://www.XenCraft.com
> > Making e-Business Work Around the World
> > -------------------------------------------------------------
> > _______________________________________________
> > Ietf-languages mailing list
> > Ietf-languages at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/ietf-languages
> 
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex at XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------


More information about the Ietf-languages mailing list