Wikimedia language codes

Mon Nov 13 07:47:06 CET 2006

Don Osborn scripsit:

> There are cases where I think the ISO-639-3 codes would definitely not
> be ideal for localization or for Wikipedia editions, for instance. Maybe
> ISO-639-1/-2, or -5 would be a more appropriate grouping.

You will always be free to use broader 639-1/2 codes (1 is always
preferred to 2 when both are available) rather than narrower 639-3
codes if you want.  The question of 639-5 has not yet arisen, and
as far as I know no one is proposing to add it to RFC 4646bis or
any successor.  If you'd care to make the case for it, I'd like to see it.

> ISO-639-6 lets us be even more specific than -3, and to group subunits
> in different ways (as I understand it)

As far as I know, the 639-6 codes form a single tree classification:
you get exactly the groupings that the system hands you, which are
essentially conventional genetic groupings down to the level of
language, and below that are the usual geographical, temporal, and
orthographical variants.

> In fact, it turns out that since 1990 a standardized version for all 4
> has been developed called Runyakitara. It is not yet coded in -2 or -3
> (and actually might be considered a "macrolanguage" and thus a logical
> candidate for ISO-639-2). This information is not apparent from any
> of the available codes.

Just to clarify, although all existing macrolanguages have 639-2 codes,
there's no requirement that future macrolanguages be encoded in 639-2 as
well; Runyakitara might be added as a macrolanguage in a future version
of 639-3.

> I am most familiar personally with Fula (I learned ffm for 2 years and
> then transitioned to fuf for 2 years [an interesting process] and in
> those days had occasion to speak with fuc speakers; later I interacted
> with fuh and fuq speakers and various others along the way. None of
> which accords me any special authority, but it definitely leads me to
> see that there is an ongoing validity and utility to  the ff/ful tags
> from ISO-639-1&2.

This is *precisely* why RFC 4646bis will (unless things are drastically
changed) mandate ff-ffm, ff-fuf, ff-fuc, ff-fuh, and ff-fuq rather than
the simple language subtags: so that straightforward matching of
any of these against simple "ff" is possible.

> Since I also speak Bambara (dooni) let me suggest that the Manding
> tongues also present another somewhat particular and complicated
> picture not addressed for all uses by any of the ISO-639 codes. There
> is one ISO-639-1 code (bm for Bambara), 4 ISO-639-2 codes (in addition
> to bam for Bambara, there is [...]

To clarify again:  bm means the same as bam, and therefore 'bam' is
not a valid RFC 4646 language subtag.

-- 
Where the wombat has walked,            John Cowan <cowan at ccil.org>
it will inevitably walk again.          http://www.ccil.org/~cowan