Collection tags considered problematic (was: "mis" update review request)

John Cowan cowan at ccil.org
Sat Apr 14 00:25:59 CEST 2007


Randy Presuhn scripsit:

> The thing that bothers me about the comment "A collection of languages
> which don't belong to any other collection" is that it isn't compatible
> with the possibility that one or more of those languages might
> eventually be included in another (possibly new) collection.  If that
> langauage were left in the mis collection as well, the comment would
> be incorrect.  If the language were removed from the mis collection,
> stability goes out the window.  Either way, changing the comment
> wouldn't help the situation.

That's true, but it's also true of all other collection subtags.  "Mis"
is a particularly bad case, but it's not hard to devise other problematic
situations; I've changed the Subject: line accordingly.

For example, today Slobbovian (hypothetical) might be considered a Slavic
language with a heavy Germanic substrate, but tomorrow Thingummy might
be able to show conclusively that it was really a Germanic language
with a heavy Slavic superstrate.  That would invalidate the old "sla"
tag for the Slobbovian Declaration of Independence in favor of a "ger"
one, but since collections are not and cannot in practice be defined by
enumeration, such things can't be avoided.

I think we must accept that there is some risk of instability when
language collection subtags are used; that's a tradeoff against their
convenience.

Bullet point 3 of section 4.1 of rfc-4646-04 currently says:

	Use specific language subtags or subtag sequences in preference
	to subtags for language collections. A "language collection"
	is a subtag derived from one of the ISO 639-2 codes that
	represents multiple related languages. For example, the code 'cmc'
	represents "Chamic languages". The registry contains values for
	each of the approximately ten individual languages represented
	by this collective code. For example 'jra' (Jarai) and 'cja'
	(Western Cham).

I suggest adding the following additional text:

	Using a collective language code may often be convenient or
	necessary when detailed information is not available.  However,
	collections are defined not by enumerating specific languages,
	but by genetic or other criteria, and so a specific language may
	be moved out of a given collection if further information about
	the language becomes available.  Thus collective language codes
	are inherently more unstable.

For guidance in interpreting these suggestions, I think we now
need to include the 639-3 scope information in language subtags:
"individual language", "macrolanguage", "collective" or "private use".


-- 
John Cowan                              <cowan at ccil.org>
            http://www.ccil.org/~cowan
                .e'osai ko sarji la lojban.
                Please support Lojban!          http://www.lojban.org


More information about the Ietf-languages mailing list