Wikimedia language codes

John Cowan cowan at
Mon Nov 13 07:31:36 CET 2006

Gerard Meijssen scripsit:

> One of the disputes is about the Belaruse wikipedia that has been
> squatted by people who insist on using an orthography that is not the
> official one. There is a vibrant group of Belaruse using the official
> orthography that wants to claim on the same domain.

Private subtags can resolve this problem, as can registered variant

> One of our problems is not solved because you do not consider the
> ISO-639-3 "official".

Neither does ISO.  We simply *cannot* add ISO code elements to our
standard until ISO issues a final document.  Until then, they are still
subject to change.

> What we do understand is  that none of the ISO-639-3 codes will ever
> be used other then for its defined purpose.

That is true.  Consequently, you can (at your own risk) use them, as
at least one other project is already doing.  If you want RFC 4646bis
compatibility, then you have to treat the languages that are encompassed
by an ISO 639-3 macrolanguage specially:  ar-arz, not merely arz, for
Egyptian Arabic.

> An often recurring theme in our request for new projects is that
> people claim that something is a language. It happens regularly that
> the proponents point to what should be amounts of impressive content
> either in archives, libraries on the Internet, all stuff  that is to
> most of us goobledegook. Often it is claimed that they have applied
> for recognition for their language. It does not make sense to request
> it from anyone but Ethnologue as the ISO-639-2 is at its end of life.

As I've posted before, there is no universally satisfactory definition
of "language" as distinct from "language variant".  We use ISO codes
because they are available, not because we think they are perfect.
You can use private-use subtags or registered variant subtags to
refer to distinctions that are finer grained.

> There was some earlier discussion of the Min-Nan language on this
> mailing list. For your information both the Min-Nan Wiktionary and
> Wikipedia are not in either the Hant or the Hans script, it uses
> Latn.When you start off from zh as the basis you insist on and equally
> the people who write Min-Nan without exception use Latn, the code
> zh-nan-Latn is not logical at all.

It's not the case that because zh has a standard script, zh-nan
necessarily shares that script.  If you know that zh-nan always uses
Latin script, then by all means don't bother with the 'Latn' subtag.

>    * A list with the all the ISO-639 codes (1, 2 and 3) and the codes
>      that these languages have under RFC 4646.

Right now, there simply are no RFC 4646 codes for most of the languages
in ISO 639-3.  We are doing our very best to change that as fast
as possible, so that *every* ISO 639-3 language will have its own
RFC 4646bis language subtag (most one-part, some two-part).

> I am sure that the first list exists. With this list it is possible to 
> compile the second list. For some WMF language codes we may need to ask 
> for tags to identify them properly by their dialect, orthography or 
> whatever makes them special.


LEAR: Dost thou call me fool, boy?      John Cowan
FOOL: All thy other titles    
             thou hast given away:      cowan at
      That thou wast born with.

More information about the Ietf-languages mailing list