Alemanic & Swiss German

Gerard Meijssen gerardm at wiktionaryz.org
Wed Dec 6 11:13:32 CET 2006


Hoi,
The OmegaT software (a CAT tool) currently uses JAVA to indicate 
languages. As a consequence it is next to useless when languages that 
are in the long tail are to be translated. According to Mark's 
presentation Google only recognises some 100 languages. ISO-639-3 
recognises some 7603. The Wikimedia Foundation supports 250.

When you only work inside what the Standard supports there is no 
apparent problem. The problems starts when a Standard does not support a 
language.

Thanks,
     Gerard

Mark Davis schreef:
> >  In a presentation of Google it
> > was suggested that the coding of content with language codes is so
> > unreliable that it is practically useless.
>
> I suspect that this was an impression left by my presentation at the 
> Unicode conference. It is true that for web pages, the language 
> tagging is pretty minimal (about 15%) and too often incorrect to be 
> relied upon. However, that is far from saying that BCP 47 (RFC 4646) 
> is useless. It provides a stable, unambiguous, identification system 
> for communicating language information between software components. 
> Even with web pages, once the language of a web page is heuristically 
> determined (and any existing tag can help to break ties), the language 
> tag is used internally to communicate with any process that needs to 
> deal with that page. And there are many other uses of language tags -- 
> communicating the user's choice of UI language is an obvious one.
>
> The key issue for web pages in particular is that their producers 
> don't immediately see much value in accurate tagging, because the 
> consequences of omission are not immediately apparent, and at this 
> point at least, not that bad.
>
> Mark



More information about the Ietf-languages mailing list