gerardm at wiktionaryz.org
Mon Nov 27 17:49:02 CET 2006
Don Osborn schreef:
> John Cowan wrote:
>> Gerard Meijssen scripsit:
>>> Correct tagging implies that it is precise.
>> This may be true in your particular application of language tagging,
>> but not in general. Often precision is unobtainable or even
>> undesirable. RFC 4646 says "tag wisely", *not* "tag exhaustively".
> Thanks for this remark re tagging. The way I read it, the appropriate or
> "wise" precision of the tagging depends on the context and need. In some
> cases, imprecision might accommodate reality more appropriately, and in
> others more precision would be indicated.
> In returning to the Wikipedia & WiktionaryZ discussion earlier, I would
> think that Wikipedia would not always want to be too precise (ISO-639-3 has
> 20-some codes for Arabic, but you'd only want to use ISO-639-1's code ar for
> ar.wikipedia.org [with only a rare possible exception]).
> On the other hand, WiktionaryZ might want to rely more systematically on
> ISO-639-3 (and eventually perhaps -6) which can specify the origins and use
> of words that may be particular in form, pronunciation or meaning according
> to dialect.
I agree with the sentiment for Wikipedia. It is however problematic that
the arguments for the creation of new editions of Wikipedia are not
always based on linguistic but often on political arguments. The latest
of these is a request for a Montenegrin language. The Wikipedia article
on Montenegrin is dismissive and explains that it is part of a language
continuum where some want to make these changes stronger by including
extra characters while the language used to be part of what is
considered Serbian or even Serbo-Croatian.
WiktionaryZ does use ISO-639-3 to indicate languages. There are now
portals for the vast majority of these set up that already contain a
bare minimum of information. For languages like Mandarin, Hausa and
Serbian we allow for alternate scripts. Having a connection to what is
going to be the ISO-639-6 would indeed be beneficial, WiktionaryZ allows
for the modelling of hierarchical relations that is part of the
ISO-639-6 set-up. It being a wiki would allow people to come up with
their arguments for why "their" language is not how it is perceived by
those scientist types .. :) I am happy to say that we already ask people
to create Swadesh lists for their language/dialect. We already /have
/people interested in languages like Ripuarian or is it Koelsch in
Wikipedia (this being a big continuum of hard to define dialects). We
have already done some thinking on how to handle these types of non
The problem with "tag wisely" is not necessarily clear to all. When
content is available on the Internet, one groups need for very specific
tagging would be deemed unwise and unwanted for others. ISO-639-3 does
provide much more and clearly needed granularity and is not easily
translated into the IANA language subtags. Given that the ISO-639-3 will
be soon official, I hope this will coincide with an authorised list that
maps these ISO codes to what the IETF wants us to use.
More information about the Ietf-languages