New variant subtags for Serbian language
Doug Ewell
doug at ewellic.org
Sun Nov 17 19:52:23 CET 2013
Milos Rancic <millosh at gmail dot com> wrote:
> Besides different pronunciation (and spelling) of old vowel Jat,
> Serbian language has two standard scripts -- Cyrillic and Latin. As
> that's a kind of rarely used variants -- contrary, it's commonly used
> -- it would be good to have shorter tags for those combinations:
>
> * ec => Ekavian Cyrillic (thus, sr-ec instead of sr-ekavn-cyrl)
That would be "Serbian as used in Ecuador," a potentially absurd but
valid tag.
> * el => Ekavian Latin (...)
> * jc => Iyekavian Cyrillic (...)
> * jl => Iyekavian Latin (...)
Those would be Serbian as spoken in three hypothetical, currently
undefined regions.
> Wikipedia is already using those tags: cf.
> http://sr.wikipedia.org/sr-el/
That is not a defense for BCP 47 purposes, because Wikipedia does not
use BCP 47.
> In any case, note that the tags for Ekavian and Iyekavian should stay
> *before* the tags for Cyrillic and Latin. You are speaking Ekavian or
> Iyekavian without writing them.
Ekavian Serbian, spoken: sr-ekavn
Ekavian Serbian, written in Cyrillic: sr-Cyrl-ekavn
Ekavian Serbian, written in Latin: sr-Latn-ekavn
These are the rules of BCP 47: language, then script, then region, then
variant.
As Mark said, variant subtags that indicate script are unnecessary
because script subtags exist. And no sort of subtag that indicates
script is appropriate for spoken content anyway.
Variant subtags should indicate a particular variation of the language
that *cannot be indicated* using any other type of non-private-use
subtag, whether script or region. ISO 639-3 defines language code
elements like "Omani Arabic" and "Cypriot Arabic," and the Registry
incorporates them, not because the additional code elements make for
shorter and more convenient tagging that "ar-OM" and "ar-CY", but
because ISO 639-3/RA has determined that those languages are truly
different from Standard Arabic and not just regional dialects.
So far, we have two variant subtags.
> * Croatian and Bosnian are Iyekavian and Latin. Bosnian standard
> allows Cyrillic, as well. (Bosnian and Serbian Iyekavian have
> differences in ~50 words, as well as the most of those Serbian words
> are correct in Bosnian, but not vice versa.). From the point of
> computational linguistics, it would be good if there is a place to put
> the information that those structures of those particular languages
> are the same.
The IANA Language Subtag Registry isn't the place, though. The Registry
identifies languages and other aspects that may influence languages so
that content can be tagged and searches can be constructed. It has
macrolanguages like 'sh' only because the underlying ISO standards have
them.
> * Montenegrin official language is still in the phase of development.
> If it's about the language used on official pages of Montenegrin
> government institutions, it is Serbian Iyekavian with two different
> words ("sjutra" instead of "sutra" ["tomorrow"] and "medjed" instead
> of "medved" ["bear"]). If it's about the standard proposed by Doclean
> Academy of Sciences and Arts, then it's about the language system the
> most distant of all other standard languages (it has more phonemes, it
> isn't neo-Shtokavian). Thus, I'd leave this issue until Montenegrins
> make their own decisions. In both variants, Montenegrin could be
> written in Cyrillic and Latin, though Latin is preferred.
"Montenegrin" won't be a language subtag in the Registry unless and
until ISO 639-3/RA assigns it a code element. The opinion of almost
everyone who does not have nationalistic skin in the game, including
Ethnologue, is that "Montenegrin" is either a dialect or simply a
"variety" of Serbian. It can be represented by "sr-ME", just as
Australian English is represented by "en-AU".
> * Language systems spoken on the territories of Serbia, Croatia,
> Bosnia and Herzegovina and Montenegro (could be called "Serbo-Croatian
> in wider sense"):
> ** Chakavian (should get ISO 639-3 code, has ISO 639-6 code)
> ** Kaykavian (should get ISO 639-3 code, has ISO 639-6 code)
> ** Torlakian (should get ISO 639-3 code, has ISO 639-6 code)
> ** Shtokavian (should get ISO 639-3 code, has ISO 639-6 code)
> *** Old Shtokavian dialects
> **** Zeta-South Sanjak dialect: basis for Doclean Montenegrin.
> **** ...
> *** New Shtokavian dialects or neo-Shtokavian; could be called
> "Serbo-Croatian in narrower sense".
> **** Ikavian dialects of Western Herzegovina
> **** Iyekavian dialects of Eastern Herzegovina. This is the basic
> dialect for all of the standard languages (except Doclean variant of
> Montenegrin).
> **** Ekavian dialects of Northern [proper] Serbia and Vojvodina. Those
> dialects influenced Serbian Ekavian standard, though Serbian Ekavian
> standard is mostly Ekavian variant of Eastern Herzegovina dialect.
Breaking out the dialects in this way would be a question for ISO
639-3/RA, not this group. But it would basically involve scrapping all
their existing code elements for Serbian, Bosnian, Croatian, etc. and
replacing them with these genetic classifications, so I wouldn't expect
the RA to make that move any time soon.
For identifying the language of content, or specifying a language for
search or retrieval, or any of the functions of BCP 47 -- not the study
of the relationships between languages or their history -- it looks like
we still have two variant subtags.
--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell
More information about the Ietf-languages
mailing list