Wikimedia language codes

Sun Nov 12 21:05:27 CET 2006

On Sun, Nov 12, 2006 at 07:08:02PM +0100,
 Gerard Meijssen <gerardm at wiktionaryz.org> wrote 
 a message of 80 lines which said:

> one of the tasks in front of us is to come up with the appropriate
> codes for the existing projects.

I already wrote a more detailed plan on the wikipedia-l at Wikimedia.org
mailing list, but my message was never distributed, for reasons I
ignore.

Basically, what I said was "Wikimedia should only use RFC 4646
language tags". They cover all the needed cases.

> One of the disputes is about the Belaruse wikipedia that has been
> squatted by people who insist on using an orthography that is not
> the official one.

be-x-SPECIAL.wikipedia.org (replace SPECIAL with a suitable
identifier)

Or a request on ietf-languages, asking for a variant for this
orthography, which would avoid the 'x' and allow for
be-SPECIAL.wikipedia.org.

Of course, I do not hope that it will suppress the political problems,
but, at least, Wikimedia would have a solid basis for the naming of
projects.

> An often recurring theme in our request for new projects is that people 
> claim that something is a language. 

Wikimedia should not be involved into deciding if a language exists or
not. Organisations like LOC (the MA of 639-2) are here for that.

> There was some earlier discussion of the Min-Nan language on this
> mailing list. For your information both the Min-Nan Wiktionary and
> Wikipedia are not in either the Hant or the Hans script, it uses
> Latn.When you start off from zh as the basis you insist on and
> equally the people who write Min-Nan without exception use Latn, the
> code zh-nan-Latn is not logical at all.

Indeed, it should be zh-Latn-minnan (which seems perfectly logical to
me) or zh-minnan if all the Min Nan content is in the latin script.

>    * We use our WMF language codes  internally and externally. This is
>      imho from a standards point of view a worst case scenario

Yes, it means that Wikimedia will spend all its time duplicating the
work of LOC, Unicode, SIL... or ietf-languages.

>    * We move away from our current codes and only use "official" codes
>      both internally and externally.

This is IMHO the best solution.

>    * A list with the all the ISO-639 codes (1, 2 and 3) and the codes
>      that these languages have under RFC 4646.

They have the same code, of course. The official list is at
http://www.iana.org/assignments/language-subtag-registry.