The limit of language codes
GerardM at wiktionaryz.org
Fri Feb 16 15:55:51 CET 2007
The notion that only Google should be interested in the quality of the
tagging on the Internet is awful. There are so many things that do not
happen because of this lack of quality tagging.
Why is it that there is no support for any specific functionality for so
many languages ?? It is because it is extremely hard to recognise content as
such. Much analysis of the content of the Internet just does not happen as a
result. When people can reliably indicate: "Give an article on AIDS and give
it to me in my mother tongue", you will find that much content will become
available in other languages. It will become available because it allows for
Really I am horrified that you think it is only of interest to Google.
When you only consider books for "Project Gutenberg", you will agree that
from a language point of view there is not much consistency for western
books dated before 1700. They each feature very much their own unique
language. They are all very much one of a kind. You want to be practical for
your own purposes but I am not convinced that what you propose does help
that much. It seems to me a "one size fits all" approach. The notion that
you can use single tags without some hierarchy seems foreign to me.
As to your notion that we are not a missionary group, well to be brutally
honest I think the lack of marketing is one of the failings of the work that
has been done. The work may be of good quality but the relevancy is not what
it should be. With only 15% of the Internet content tagged and much of it
tagged incorrectly we may convince ourself that what we do is relevant. We
have however not convinced the world at large.
On 2/16/07, David Starner <prosfilaes at gmail.com> wrote:
> On 2/15/07, Gerard Meijssen <gerardm at wiktionaryz.org> wrote:
> > When you want to divide a
> > language in time slots, it is really arbitrary where you create the
> > lines.
> Just like dialects.
> > The notion of having tags for historical languages makes sense when
> > these language are dead. Tagging any other way is at best imprecise.
> I don't see why drawing a line between Old English and Middle English
> would be any more or any less complex if English were dead. Or even
> Middle English and English, since all the complexity is in a
> relatively small set of documents around 1500.
> > I am also
> > afraid that they detract from what we have to achieve first; the correct
> > tagging of content of contemporary material. With only 15% tagged of
> > material on the Internet, there is plenty of convincing that we need to
> > do. Convincing that using our tags /is /relevant.
> What we have to do first? This is not a missionary group. My primary
> goal is to create a set of language tags usable for Project Gutenberg,
> for which also having them supported by XML and other people is a
> great help. A large percentage of the books I personally do for
> Project Gutenberg date back before 1700, whether in modern editions or
> original facsimiles. That's what I'm most concerned about tagging. The
> use of these tags by other organizations and standards using language
> tagging is very convenient, because having one standard makes things
> easier for me. As to whether webpages are tagged, that's Google's
> problem; I could care less. I'm not here to achieve that, and I
> suspect many others aren't either.
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ietf-languages