Wikimedia language codes
gerardm at wiktionaryz.org
Sun Nov 12 19:08:02 CET 2006
I have said a few things in another mail thread and I think it is
helpful when I explain what I am looking for and what my current issues
are. In this mail I will only address needs that we have in the
The Wikimedia Foundation has at this moment in time exactly 250
different Wikipedia projects. Some of them have a code that is
incompatible with any ISO-639 code of any version. There are projects
that have codes that are squatting on existing ISO-639 codes. There are
codes that have been made up that currently do not trespass on what are
the codes of other languages however, I would not be surprised when this
that it is not permitted to use codes that can be mistaken for valid codes.
As there is now a "language sub-committee" in the Wikimedia Foundation,
and as it is our brief to come up with recommendations for the creation
of new projects and as the CTO of the Wikimedia Foundation is not
pleased with this situation, one of the tasks in front of us is to come
up with the appropriate codes for the existing projects. This is not
simple and it is certainly not straight forward. One of the disputes is
about the Belaruse wikipedia that has been squatted by people who insist
on using an orthography that is not the official one. There is a vibrant
group of Belaruse using the official orthography that wants to claim on
the same domain. This is one among many, most are largely political.
One of our problems is not solved because you do not consider the
ISO-639-3 "official". This is the existence of a Wikipedia in Maldovan.
What we do understand is that none of the ISO-639-3 codes will ever be
used other then for its defined purpose.
An often recurring theme in our request for new projects is that people
claim that something is a language. It happens regularly that the
proponents point to what should be amounts of impressive content either
in archives, libraries on the Internet, all stuff that is to most of us
goobledegook. Often it is claimed that they have applied for recognition
for their language. It does not make sense to request it from anyone but
Ethnologue as the ISO-639-2 is at its end of life.
There was some earlier discussion of the Min-Nan language on this
mailing list. For your information both the Min-Nan Wiktionary and
Wikipedia are not in either the Hant or the Hans script, it uses
Latn.When you start off from zh as the basis you insist on and equally
the people who write Min-Nan without exception use Latn, the code
zh-nan-Latn is not logical at all. NB these are really active projects.
For the Wikimedia Foundation there are a number of options;
* We use our WMF language codes internally and externally. This is
imho from a standards point of view a worst case scenario
* We use our codes internally and externally we advertise the
* We sanitise our codes so that there is at least no conflict with
the ISO-639 codes. We use them internally and we advertise the
* We move away from our current codes and only use "official" codes
both internally and externally.
It is as difficult to make the Wikimedia Foundation move as it is to get
movement about Standards I suspect. I think it we need a plan how this
can be solved. There are at least two lists I would like to have that
* A list with the all the ISO-639 codes (1, 2 and 3) and the codes
that these languages have under RFC 4646.
* A list with the WMF language codes and the language codes under
I am sure that the first list exists. With this list it is possible to
compile the second list. For some WMF language codes we may need to ask
for tags to identify them properly by their dialect, orthography or
whatever makes them special.
Gerard Meijssen aka GerardM
More information about the Ietf-languages