Tagging transliterations from a specific script
doug at ewellic.org
Thu Mar 17 15:07:06 CET 2011
To be 100% clear about this:
Any language tag that begins with "tt" means that the content thus
tagged is in Tatar. No subsequent subtag can override this.
Any language tag that begins with "tt-Arab" means that the content thus
tagged is in Tatar, written in the Arabic script. No subsequent subtag
can override this either.
This is not just for the benefit of dumb, rigid computers. It is the
way BCP 47 tags are defined and it is the way humans are also expected
to read them.
The use case of identifying the script in which content IS written is
MUCH, MUCH more common than the use case of identifying the script in
which it was ORIGINALLY written before being transliterated to some
other script. This latter scenario is very much a special case, and on
no account does it justify overhauling BCP 47 to make the order of
subtags flexible and their meaning dependent on their relative order.
Avram has already noted that the real distinction in his Tatar case is
not even the original pre-transliteration script, but different
orthographic conventions for Tatar that were in place at different
historical times. His solution (March 14) to define subtags for these
orthographic conventions is a better approach than trying to tag what
turns out to be the wrong distinction, and MUCH better than trying to
redefine BCP 47 -- formally or otherwise -- to facilitate tagging the
I look forward to Avram's proposals for variant subtags 'iskeimla' and
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s
More information about the Ietf-languages