How to handle macrolanguage when no code?
doug at ewellic.org
Thu Apr 9 06:09:23 CEST 2009
Don Osborn <dzo at bisharat dot net> wrote:
> In looking at the BBC website's offerings in African languages, one
> notes that they have grouped Kinyarwanda and Kirundi together under
> http://www.bbc.co.uk/greatlakes/ . This makes sense from a linguistic
> point of view since as I understand it, the two languages are almost
> the same. When looking at the view (page) source, one notes that they
> use lang="rw" (for Kinyarwanda). It may be that the pages I checked
> are properly Kinyarwanda and an expert would know that they are not
> Kirundi (rn), but it is in any event true that there is no code
> element to cover both languages.
Ethnologue says the two are mutually intelligible, which isn't quite the
same as saying they are the same language. This is one of those many
gray areas in language identification.
The fact is that we rely on the distinctions that ISO 639 makes, and if
they decide that Kinyarwanda and Kirundi (Rundi) are different languages
then that's what we have to go with. We can narrow language usage down
to dialects or other variations, but we have no mechanism to create
broader categories in such a way that a more specific tag would still
> I'm curious if there is any other recommended way to handle such a
> situation where web content may be deliberately and easily designed to
> cover more than one language as defined by ISO 639 when there is not
> currently any macrolanguage code for them. Could one for example
> define a whole page as having two languages? E.g., something like
> lang="rw, rn"?
If you can, in any markup language or other protocol, it would be a
feature of that markup language or protocol, and not of language tags or
subtags per se. This is similar to protocols that allow something like
<lang="">. It doesn't mean that the empty string is a valid language
tag; it means the "lang" syntax exceptionally allows the empty string as
I don't think we want to go down the path of offering aliases. If the
content truly is in "rw", it should be tagged as "rw" even if all
speakers of "rn" can understand it perfectly.
We have 61 "collection code" subtags available in the Registry, with
another 55 on the way when 4646bis is approved; one of those might do if
you really need a different solution. Asking ISO 639-3/RA to create a
new macrolanguage to encompass these two languages might (or might not)
create more confusion than it resolves. Remember that a new
macrolanguage would not result in new extlangs for RFC 4646bis.
Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14
More information about the Ietf-languages