WiktionaryZ language codes

Mon Nov 13 15:36:06 CET 2006

On 11/13/06, Gerard Meijssen <gerardm at wiktionaryz.org> wrote:
> A record indicating a
> language, dialect or orthography is added to a table and it includes the
> Wikimedia Foundation code and the ISO-639-3 code. We do distinguish
> American and British English from English and use codes like eng-US. It
> is obvious that we can add a column in this language table and have your
> code in there as well.

eng-US is not a ISO-639-3 code.It's not a valid code in any system I'm
familiar with. You're far better off using RFC 4646 codes and
including ISO-639-3 codes as private use codes; x-639-3-whatever (or
however you want to do it, as long as the x- is at the start), or the
macrolanguage or collective language from 639-1 or 639-2 followed by
-x-639-3-whatever.

> The ISO-639-3 codes is the biggest list of languages that have some form
> of recognition. It is therefore our best option to adopt this for our
> initial list of languages. There are issues with several of these codes
> but it allows for an operational start. It also has in Ethnologue a
> maintainer that earned its reputation as an organisation with a strong
> interest in linguistics.

Two issues: first, SIL is involved in modern languages. I don't
_believe_ there's a single language extinct for more than a couple
centuries in ISO-639-3 that's not in ISO-639-2. Secondly, SIL is
notorious for considering as multiple languages what other
organizations consider one. That's not always a good thing.

> A "simple" single code allows people to find
> their language. It also helps us to force people to understand that
> WiktionaryZ does not have zh or zho and that we consequently do not
> accept words to be registered in that way.

There's no such thing as a simple single code. You've taken a simple
single code away from Chinese and Arabic. I also feel that pointing to
ISO-639-3 for the reason you don't have a code for Chinese or Arabic
to be a cop-out; if you don't want to consider Chinese and Arabic one
language for the purpose of your dictionary, that's your responsiblity
to justify. I also think that any attempt to go head-first against the
thousand years of culture that have pushed Chinese and Arabic as
unified languages, no matter what the reality, is going to fail. You
have to give the users what they want, and frankly any attempt to
force lingustic correctness down the throats of the people who
actually use the language strikes me as a bit rude and imperalistic.

>  For the others (including dialects) we would like to know if
> they have been recognised. And if not, what it takes to get what you
> would consider appropriate codes. I am not sure what ISO ? standard
> would deal with dialects.
>
> Many languages like the Dutch language, have regular changes to the
> official orthography. For the Dutch language this happens every ten
> years. It is imho important to be able to tag text and words correctly
> to the orthography used. Only indicating something as Dutch is
> minimalistic and prevents the use of content for other purposes. It is
> not clear to me how you deal with this. It would not be surprised when
> this needs to be part of a standard too.

There is no standard that will solve these problems completely, and
never will be. Dialects are even more unstandardized than languages,
and very few languages have only one orthography, and many of the
changes are unstandardized and unnamed. They also tend to exist on
continuums in both time and space, making any definition arbitrary. If
a dialect or orthograpy is important to you, it can be registered on
this list or used with a private use code.