Wikimedia language codes

Sun Nov 12 19:08:02 CET 2006

Hoi,
I have said a few things in another mail thread and I think it is 
helpful when I explain what I am looking for and what my current issues 
are. In this mail I will only address needs that we have in the 
Wikimedia Foundation.

*==Wikimedia Foundation==*
The Wikimedia Foundation has at this moment in time exactly 250 
different Wikipedia projects. Some of them have a code that is 
incompatible with any ISO-639 code of any version. There are projects 
that have codes that are squatting on existing ISO-639 codes. There are 
codes that have been made up that currently do not trespass on what are 
the codes of other languages however, I would not be surprised when this 
infringes on the terms of use of the ISO-639 codes. My understanding is 
that it is not permitted to use codes that can be mistaken for valid codes.

As there is now a "language sub-committee" in the Wikimedia Foundation, 
and as it is our brief to come up with recommendations for the creation 
of new projects and as the CTO of the Wikimedia Foundation is not 
pleased with this situation, one of the tasks in front of us is to come 
up with the appropriate codes for the existing projects. This is not 
simple and it is certainly not straight forward. One of the disputes is 
about the Belaruse wikipedia that has been squatted by people who insist 
on using an orthography that is not the official one. There is a vibrant 
group of Belaruse using the official orthography that wants to claim on 
the same domain. This is one among many, most are largely political.

One of our problems is not solved because you do not consider the 
ISO-639-3 "official". This is the existence of a Wikipedia in Maldovan. 
What we do understand is  that none of the ISO-639-3 codes will ever be 
used other then for its defined purpose.

An often recurring theme in our request for new projects is that people 
claim that something is a language. It happens regularly that the 
proponents point to what should be amounts of impressive content either 
in archives, libraries on the Internet, all stuff  that is to most of us 
goobledegook. Often it is claimed that they have applied for recognition 
for their language. It does not make sense to request it from anyone but 
Ethnologue as the ISO-639-2 is at its end of life.

There was some earlier discussion of the Min-Nan language on this 
mailing list. For your information both the Min-Nan Wiktionary and 
Wikipedia are not in either the Hant or the Hans script, it uses 
Latn.When you start off from zh as the basis you insist on and equally 
the people who write Min-Nan without exception use Latn, the code 
zh-nan-Latn is not logical at all. NB these are really active projects.

For the Wikimedia Foundation there are a number of options;

    * We use our WMF language codes  internally and externally. This is
      imho from a standards point of view a worst case scenario
    * We use our  codes internally and externally we advertise the
      "official" codes.
    * We sanitise our codes so that there is at least no conflict with
      the ISO-639 codes. We use them internally and we advertise the
      "official" codes.
    * We move away from our current codes and only use "official" codes
      both internally and externally.

It is as difficult to make the Wikimedia Foundation move as it is to get 
movement about Standards I suspect. I think it we need a plan how this 
can be solved. There are at least two lists I would like to have that 
would help:

    * A list with the all the ISO-639 codes (1, 2 and 3) and the codes
      that these languages have under RFC 4646.
    * A list with the WMF language codes and the language codes under
      RFC 4646.

I am sure that the first list exists. With this list it is possible to 
compile the second list. For some WMF language codes we may need to ask 
for tags to identify them properly by their dialect, orthography or 
whatever makes them special.

Thanks,
   Gerard Meijssen aka GerardM