Unilingua

John D. Burger john at mitre.org
Tue Sep 20 16:04:56 CEST 2005


[Of automatic language tagging]

Michael Everson wrote:

> Harald Tveit Alvestrand wrote:
>
>> Google's been using such a tool for years; it does not expose its 
>> tags, but allows you to search for them (for a limited list of 
>> languages).
>
> A badly limited list.

Currently about 35 - more than I expected.  More would be better, of 
course, but all of the successful approaches to automatic language ID 
of which I am aware make use of statistical models trained from example 
texts - often substantial amounts are necessary.

So I suspect that the set of languages that these tools will 
successfully ID will always be a (small?) fraction of what's actually 
out there.  And Google will never tag Unilingua documents 
automatically. :)

- John D. Burger
   MITRE




More information about the Ietf-languages mailing list