Tagging transliterations from a specific script

Avram Lyon ajlyon at ucla.edu
Mon Feb 14 08:45:52 CET 2011

[Re-sending, since I mistakenly sent this only to CE Whitehead last time.]

2011/2/14 CE Whitehead <cewcathar at hotmail.com>:
> If it's for a transliteration into Latin script then how would you tag it
> tt-Arab . . . ?
> (I'm sorry to ask a dumb question.)

That's my point, really. It can't be tt-Arab-alalc97, because that
would make no sense. But it does matter that this is the ALA-LC
romanization from Tatar in the Arabic script, and not from Tatar in
the modern Cyrillic script.

> I'm not sure that I completely understand the request (my apologies).
> Another option is to use metadata and certainly perhaps the text date would
> provide a clue as to the original script (if that's what you are asking
> for:  a way to distinguish the original script).
> However I personally have no objection to having two variants indicating two
> distinct ala-lc romanizations,
> but I hope we will hear from a few others regarding this matter (I am not
> the expert in ala-lc romanizations).

I know the original script, and indeed it's pretty obvious from
looking at the romanized text which source script was used for the
romanization. But I am looking for a way to tag it, since the tagged
text has to be used by bibliographic software that is supposed to
choose the form of the text that specific citation style guides

> In any case
> [alalc97] is not just for Tatar, is it? (let me know if it is)
> So would other Romanizations from Arabic script (from other languages) fit
> into your scheme?

As the person who requested alalc97, I understand of course that it is
not just for Tatar. This general issue of distinguishing ALA-LC
romanizations from various scripts of the same language does indeed
affect other languages. It certainly matters for Azerbaijani, Bashkir,
Uzbek, and other Turkic languages that had a similar history of using
an Arabic script before the introduction of Janalif (baku1926). It
also matters for Turkish and Ottoman Turkish, but in that case the
latter is represented by "ota", so ota-alalc97 and tr-alalc97 are
distinct already.

I'd suggest a tag for the Turkic languages affected by the
introduction of Janalif, before the introduction of the same, but I
don't want to cause the same justifiable concern that was raised about
my proposed "pre1917" tag on this list last fall. Also, such a tag
would really just represent a script, so in most cases it would be
equivalent to, e.g., tt-Arab, az-Arab. It only really is needed, then,
when the actual script is not Arabic, so tt-Latn-ARABIC (not a real,
or legal, subtag). So tt-Arab and tt-ARABIC are completely identical.

If a subtag for the pre-Latin and pre-Cyrillic forms of these various
Turkic languages is deemed appropriate, I'll look into what diversity
there is in the Arabic scripts, so I can craft a defensible and strong
proposal for a new subtag. My understanding is that there were
multiple types of Arabic script used for these languages, so we may be
able to justify tags on the grounds of orthographic reforms and

Again, thanks for your advice as I try to work this out.


Avram Lyon

More information about the Ietf-languages mailing list