How to handle macrolanguage when no code?

Doug Ewell doug at ewellic.org
Thu Apr 9 06:09:23 CEST 2009


Don Osborn <dzo at bisharat dot net> wrote:

> In looking at the BBC website's offerings in African languages, one 
> notes that they have grouped Kinyarwanda and Kirundi together under 
> http://www.bbc.co.uk/greatlakes/  . This makes sense from a linguistic 
> point of view since as I understand it, the two languages are almost 
> the same. When looking at the view (page) source, one notes that they 
> use lang="rw" (for Kinyarwanda). It may be that the pages I checked 
> are properly Kinyarwanda and an expert would know that they are not 
> Kirundi (rn), but it is in any event true that there is no code 
> element to cover both languages.

Ethnologue says the two are mutually intelligible, which isn't quite the 
same as saying they are the same language.  This is one of those many 
gray areas in language identification.

The fact is that we rely on the distinctions that ISO 639 makes, and if 
they decide that Kinyarwanda and Kirundi (Rundi) are different languages 
then that's what we have to go with.  We can narrow language usage down 
to dialects or other variations, but we have no mechanism to create 
broader categories in such a way that a more specific tag would still 
match.

> I'm curious if there is any other recommended way to handle such a 
> situation where web content may be deliberately and easily designed to 
> cover more than one language as defined by ISO 639 when there is not 
> currently any macrolanguage code for them. Could one for example 
> define a whole page as having two languages? E.g., something like 
> lang="rw, rn"?

If you can, in any markup language or other protocol, it would be a 
feature of that markup language or protocol, and not of language tags or 
subtags per se.  This is similar to protocols that allow something like 
<lang="">.  It doesn't mean that the empty string is a valid language 
tag; it means the "lang" syntax exceptionally allows the empty string as 
a value.

I don't think we want to go down the path of offering aliases.  If the 
content truly is in "rw", it should be tagged as "rw" even if all 
speakers of "rn" can understand it perfectly.

We have 61 "collection code" subtags available in the Registry, with 
another 55 on the way when 4646bis is approved; one of those might do if 
you really need a different solution.  Asking ISO 639-3/RA to create a 
new macrolanguage to encompass these two languages might (or might not) 
create more confusion than it resolves.  Remember that a new 
macrolanguage would not result in new extlangs for RFC 4646bis.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ



More information about the Ietf-languages mailing list