Wikimedia language codes

John Cowan cowan at
Mon Nov 13 20:25:38 CET 2006

Don Osborn scripsit:

> I'm trying to find out more about what has already been proposed for 639-5.
> Any references would be appreciated (Googling gets articles that mention it,
> but not much more)

Current drafts are still private to the ISO working group.  A draft
dated January 2005 was made available to me under nondisclosure terms.
I think it would do no harm to make the following statements:

1) The draft contains a superset of ISO 639-2 language collection code
elements.  The shared elements have the same 3-letter codes as in 639-2,
whereas the novel ones have 3-letter codes not shared with 639-2 or 639-3.

2) For the most part, the additional codes are along conventional
classificatory lines: thus there are codes for "IE languages", "Germanic
languages" (shared with 639-2), "North Germanic languages", "West Germanic
languages", and "East Germanic languages".

3) Hierarchy information is available as part of the standard: thus it
is noted that "Bamileke languages" is a subgroup of "Bantu languages",
which is a subgroup of "Atlantic-Congo languages", which is a subgroup of
"Niger-Khordofanian languages".  The hierarchy has no root.

4) It is noted that a number of language families used by Ethnologue are
not represented in the code, but no explicit principles for selection
are given.  It is stated that the code, like other parts of ISO 639,
will grow.

> This is interesting. As a relative newcomer to all this my impression was
> that there was a sort of hierarchical logic inherited in part from the
> pre-existence of 639-1&2 and reflected in the brief descriptions of the 639
> ensemble. Your clarification of the nesting of tags in the case of Fula and
> a similar mention with regard to Arabic seem to show that logic in action.

Macrolanguages were designed as a way of managing part of the mismatch
between 639-2 and 639-3 due to their different original purposes.
They reflect the situation where a 639-2 individual-language code is
mapped onto more than one 639-3 language code.  They do not always
represent genetic groupings.

However, historical origin does not necessary dictate current or future
use.  It is expected, e.g., that if a language is split by the 639-3/RA
into more than one language code element, that the existing code element
will be changed into a macrolanguage.

> What concerns me first of all is that bm/bam is so close
> to dyu that (1) for instance a localization effort in Burkina Faso is
> talking about Bambara (bm) rather than Jula (dyu), and that (2) there is no
> code to cover that. 

There is nothing that prevents you from proposing a macrolanguage tag
for either 639-2 (if the "at least fifty documents in at most five
institutions" criterion is met) or for a future version of 639-3.

> Another concern - on a higher level - is that there is no code to cover the
> macro-macrolanguage (if you will) that would include the man ("Mandingo")
> macrolanguage and the bm + dyu macrolanguage-without-a-tag. This is not pure
> theory - the Manding languages are close. Linguists will point out
> differences, speakers will recognize them, but in some ways the ensemble is
> like Fula ff but in a more concentrated geographic area of West Africa.

Nor is there any rule against nested macrolanguages.

> It may be that 639-5 is a non-issue (loose paraphrase of an offline message
> from another group member and noting your mention that as far as you know
> "no one is proposing to add it to RFC 4646bis or any successor") but it
> would have seemed to be a useful element in the ISO-639 schema given the
> kinds of situations I've outlined. In some ways there seems to be an overlap
> with 639-2 - which is not news I realize. But the extent to which new
> "macrolanguages" - a category that is already by accident of history (so to
> speak) under 639-2 - would be added to 639-3 but not the latter creates more
> confusion (at least for this particular human). So codes for groupings of
> languages ("languages" by the definition used for 639-3) might exist in
> 639-2 or be newly registered in 639-3 while on paper they are the subject of
> 639-5?

To clarify:  639-2 is a heterogeneous list from the viewpoint of -3 and
-5.  It contains individual languages (shared with -3), macrolanguages
(shared with -3 and possibly with -5 as well)), and language collections
(shared with -5).  The essence of a macrolanguage is that it may be
treated as a single language for some purposes and a language collection
for other purposes.  The cases you describe sound like calls for
macrolanguages to me; as such they would be registered in -3 (and as
a result in 4646bis as well) -- they might or might not be registered
in -5.  When there is no possible reason to treat a language collection
as a single language, it belongs in -5 and (as far as anyone can see)
has no applicability to 4646.  The existing language collections of
course remain in 4646 and its successors.

There are three kinds of people in the world:   John Cowan
those who can count,                            cowan at
and those who can't.

More information about the Ietf-languages mailing list