Serbo-Croatian continuum: the top level

Peter Constable petercon at
Mon Mar 3 05:09:54 CET 2014


The report from the 639-3 RA mentions the need to first resolve the issue in relation to ISO 639-2. Did you make a request to the 639-2 RA?

Can you provide more info on the usage scenarios for which these would be needed, and how users and interop would be impacted by creating distinct primary language subtags rather than using existing primary language subtags with new variant subtags?


-----Original Message-----
From: ietf-languages-bounces at [mailto:ietf-languages-bounces at] On Behalf Of John Cowan
Sent: March 2, 2014 11:39 AM
To: ietf-languages at
Subject: Serbo-Croatian continuum: the top level

Well, the ax has dropped, and the ISO 639-3/RA has rejected the proposal to encode Kajkavian: see <>
for details.  This means that the varieties of the Serbo-Croatian continuum (henceforth SCC) that are outside the three standard languages are currently unrepresentable within the BCP 47 tagging system.  In this posting, I am going to address only the top-level varieties of the SCC.

In order to make such tagging possible, we must resolve three questions:
(1) How should we deploy the necessary subtags?  (2) What subtags should be used?  (3) What entities should be tagged?  (This order may seem irrational, but is from most to least complicated.)

(1) Assuming for the moment that we create one tag per variety, what kind of tags will they be?  I see three possibilities:

(1a) Create variant subtags and attach them to the appropriate primary-language subtags for the national standard languages.  Thus, Kajkavian would be tagged as a variant of "hr".  The difficulties here are twofold:  Kajkavian is much more different from Standard Croatian than the latter is from Standard Serbian or Standard Bosnian, and it is not clear what to do about neo-Shtokavian, which is spread across all the relevant countries.  Indeed, the three standard languages are all sub-subvarieties of neo-Shtokavian.

(1b) Create variant subtags and attach them directly to the macrolanguage subtag 'sh', which covers the whole SCC.  This was my earlier proposal, and is linguistically correct as far as it goes, but tends to undermine the notion of a macrolanguage as a group of _languages_, by effectively coordinating languages with varieties.

(1c) We can use our extraordinary powers under Section 2.2.1 subsection
5 of RFC 5646 and create our own primary language tags.  The RFC says "an attempt to register any new proposed primary language MUST be made to the ISO 639 registration authority".  Technically, this would only authorize the creation of a tag for Kajkavian, but I think we can take it as read that the RA would reject the others on the same grounds.

The disadvantages are that BCP 47 primary language tags would no longer automatically be ISO 639 code elements, and that the new language tags, though substantively encompassed by 'sh', would not formally be so (though there seems to be no explicit prohibition on adding Macrolanguage: fields to such entries).  Despite these points, I currently favor this solution.

(2) The constraints on list-created primary-language subtags and on variant subtags are the same: 5 to 8 characters.  The worst case is that we need five tags, for Kajkavian, Chakavian, neo-Shtokavian, palaeo-Shtokavian, and Torlakian.  We already have the subtags "ekavsk"
and "ijekavsk", but following this slavishly would give us "nshtokavsk", which is too long.

The first three are differentiated by the word used for "what?", respectively "kaj", "ča", "što".  This is of course not the only difference, just a convenient marker.  It might also be a good idea to use "sh" as the first two characters of the subtags, particularly if we decide to create primary-language subtags.  That would give us 'shkaj', 'shcha', 'shnshto', 'shpshto', and 'shtor'.  This is what I propose.

(3) Finally, there remains the question of just which entities to tag.  The first three listed above are beyond doubt.  We could merge neo-Shtokavian and palaeo-Shtokavian into a single entity if we had to, though they are quite different.  More vexed is the question of whether Torlakian is just one subvariety of palaeo-Shtokavian or a separate coordinate variety.  The precedent set by Ethnologue and ISO 639-3 is "when in doubt, separate" and that's what I recommend here.

Unless I get pushback on this (and I expect and hope to do so), I'll propose these five subtags as primary-language subtags sometime next week.

John Cowan <cowan at>   
"Make a case, man; you're full of naked assertions, just like Nietzsche."
"Oh, i suffer from that, too.  But you know, naked assertions or GTFO."
                        --heard on #scheme, sorta _______________________________________________
Ietf-languages mailing list
Ietf-languages at

More information about the Ietf-languages mailing list