Unilingua
John D. Burger
john at mitre.org
Tue Sep 20 16:04:56 CEST 2005
[Of automatic language tagging]
Michael Everson wrote:
> Harald Tveit Alvestrand wrote:
>
>> Google's been using such a tool for years; it does not expose its
>> tags, but allows you to search for them (for a limited list of
>> languages).
>
> A badly limited list.
Currently about 35 - more than I expected. More would be better, of
course, but all of the successful approaches to automatic language ID
of which I am aware make use of statistical models trained from example
texts - often substantial amounts are necessary.
So I suspect that the set of languages that these tools will
successfully ID will always be a (small?) fraction of what's actually
out there. And Google will never tag Unilingua documents
automatically. :)
- John D. Burger
MITRE
More information about the Ietf-languages
mailing list