Fwd: Tagging transliterations from a specific script

Sat Feb 12 23:24:44 CET 2011

Dear IETF-Languages,

I have a set of data available in several forms: Tatar, written in the
Arabic script (tt-Arab); Tatar, written in the Cyrillic script
(tt-Cyrl); transliteration of that same text into Latin script. The
original text is in tt-Arab, so the transliteration (since it follows
ALA-LC 1997) should certainly be tagged tt-alalc97. That tag, however,
is precisely what we'd use for a transliteration using the ALA-LC
system from tt-Cyrl as well. Thus, there's no way to distinguish
between the two very different representations of the same text (i.e.,
the ALA-LC system is defined for Arabic scripts and for Cyrillic
scripts, but the systems lead to very different representations).

The real-world case where this arises is in the multilingual version
of Zotero, the bibliographic data management software. There, we're
allowing the entry of alternate representations of key fields using
any valid language tag, which has been great so far. But now we can't
represent this distinction; it would be something like
*tt-Arab-alalc97, but subtags aren't supposed to override one another,
just refine each other.

I think it might be appropriate to introduce a variant subtag for
Tatar in the Arabic script, which was used until the introduction of
Janalif in 1927-1928 (tt-Latn, tt-baku1926), but I'd be glad to hear
other options for distinguishing these data.

Regards,

Avram