Serbo-Croatian continuum: the top level

John Cowan cowan at
Sun Mar 2 20:39:02 CET 2014

Well, the ax has dropped, and the ISO 639-3/RA
has rejected the proposal to encode Kajkavian: see
for details.  This means that the varieties of the Serbo-Croatian
continuum (henceforth SCC) that are outside the three standard languages
are currently unrepresentable within the BCP 47 tagging system.  In this
posting, I am going to address only the top-level varieties of the SCC.

In order to make such tagging possible, we must resolve three questions:
(1) How should we deploy the necessary subtags?  (2) What subtags should
be used?  (3) What entities should be tagged?  (This order may seem
irrational, but is from most to least complicated.)

(1) Assuming for the moment that we create one tag per variety, what
kind of tags will they be?  I see three possibilities:

(1a) Create variant subtags and attach them to the appropriate
primary-language subtags for the national standard languages.  Thus,
Kajkavian would be tagged as a variant of "hr".  The difficulties here
are twofold:  Kajkavian is much more different from Standard Croatian
than the latter is from Standard Serbian or Standard Bosnian, and it is
not clear what to do about neo-Shtokavian, which is spread across all
the relevant countries.  Indeed, the three standard languages are all
sub-subvarieties of neo-Shtokavian.

(1b) Create variant subtags and attach them directly to the macrolanguage
subtag 'sh', which covers the whole SCC.  This was my earlier proposal,
and is linguistically correct as far as it goes, but tends to undermine
the notion of a macrolanguage as a group of _languages_, by effectively
coordinating languages with varieties.

(1c) We can use our extraordinary powers under Section 2.2.1 subsection
5 of RFC 5646 and create our own primary language tags.  The RFC says
"an attempt to register any new proposed primary language MUST be made
to the ISO 639 registration authority".  Technically, this would only
authorize the creation of a tag for Kajkavian, but I think we can take
it as read that the RA would reject the others on the same grounds.

The disadvantages are that BCP 47 primary language tags would no longer
automatically be ISO 639 code elements, and that the new language tags,
though substantively encompassed by 'sh', would not formally be so (though
there seems to be no explicit prohibition on adding Macrolanguage: fields
to such entries).  Despite these points, I currently favor this solution.

(2) The constraints on list-created primary-language subtags and
on variant subtags are the same: 5 to 8 characters.  The worst case
is that we need five tags, for Kajkavian, Chakavian, neo-Shtokavian,
palaeo-Shtokavian, and Torlakian.  We already have the subtags "ekavsk"
and "ijekavsk", but following this slavishly would give us "nshtokavsk",
which is too long.

The first three are differentiated by the word used for "what?",
respectively "kaj", "ča", "što".  This is of course not the only
difference, just a convenient marker.  It might also be a good idea to
use "sh" as the first two characters of the subtags, particularly if we
decide to create primary-language subtags.  That would give us 'shkaj',
'shcha', 'shnshto', 'shpshto', and 'shtor'.  This is what I propose.

(3) Finally, there remains the question of just which entities to
tag.  The first three listed above are beyond doubt.  We could merge
neo-Shtokavian and palaeo-Shtokavian into a single entity if we had to,
though they are quite different.  More vexed is the question of whether
Torlakian is just one subvariety of palaeo-Shtokavian or a separate
coordinate variety.  The precedent set by Ethnologue and ISO 639-3 is
"when in doubt, separate" and that's what I recommend here.

Unless I get pushback on this (and I expect and hope to do so), I'll
propose these five subtags as primary-language subtags sometime next week.

John Cowan <cowan at>   
"Make a case, man; you're full of naked assertions, just like Nietzsche."
"Oh, i suffer from that, too.  But you know, naked assertions or GTFO."
                        --heard on #scheme, sorta

More information about the Ietf-languages mailing list