Tagging transliterations from a specific script

Avram Lyon ajlyon at ucla.edu
Mon Mar 14 21:23:56 CET 2011

2011/3/14 Doug Ewell <doug at ewellic.org>:
> I'd like to see if we can address Avram Lyon's request, either accepting
> one of the possible solutions or rejecting them all and recommending a
> private-use subtag, instead of just leaving it hanging.

Thank you for bringing this back up. I have a slowly progressing draft
response that I just hadn't sent yet.

> As I understand it, Avram's use case is that he has (potentially) two
> samples of text:
> * Tatar, transliterated from original Arabic into Latin
> * Tatar, transliterated from original Cyrillic into Latin
> Currently, without using private-use subtags, both of these would have
> to be tagged as "tt-alalc97".  (This could also be "tt-Latn-alalc97",
> but for simplicity, the script subtag will be left out of the examples
> which follow.)  Avram argues that the differences between these samples
> require distinct tagging.  I don't see this as an "obvious error" in
> registering 'alalc97', but simply a use case which was not envisioned at
> the time.

Now that I've thought about this more, I think that we can completely
eliminate the problem by looking at the orthographic reforms that
accompanied the change in script; it should be possible to define a
reasonable variant subtag for Tatar orthography in the Iske imla and
Yana imla periods; they're discussed on Wikipedia, but I'm having a
hell of a time pinning down dates and authoritative works on the two,
despite working with the primary sources here in Kazan. I can confirm
that there is a real shift in orthography within the Arabic script
after 1920, then the introduction of Janalif does indeed bring with it
more changes in the language (changes in addition to switching the set
of symbols in use from Arabic to Latin).

I think the way out of the current mess is to define subtags for these
stages in the development of Tatar (and perhaps Bashkir) orthography,
Iske imla, used for written Tatar from about the time of Qayumi Nasiri
to 1920 (with wide variation in individual practice)
Yana imla, used for written Tatar from 1920 until the introduction of Janalif

I have a short bibliography of orthographic manuals on the early
Janalif period that confirm that (1) iske imla and yana imla are
distinct orthographies and (2) yana imla and janalif differ in more
than script.

Then my use case could be tagged as such:
> * Tatar, transliterated from original Arabic into Latin
 -- OR --
> * Tatar, transliterated from original Cyrillic into Latin
tt-alalc97 [if we're assuming tt-Cyrl]
-- OR --
tt-?????-alalc97 [if a new variant tag is assigned for the present
Tatar Cyrillic orthography, possibly to distinguish it from the
Krashen Tatar Cyrillic orthography used until ~1939 among Christian
Tatars, or simply to help us in this specific case.]

So my thinking is that a careful look at the changes the languages
themselves underwent will help us find a way out of this mess-- there
are authentic language variants that bear more meaning than is
conveyed by the script subtag.


More information about the Ietf-languages mailing list