Tagging transliterations from a specific script
addison at lab126.com
Mon Feb 14 17:22:21 CET 2011
> I'd suggest a tag for the Turkic languages affected by the
> introduction of Janalif, before the introduction of the same, but I
> don't want to cause the same justifiable concern that was raised
> my proposed "pre1917" tag on this list last fall. Also, such a tag
> would really just represent a script, so in most cases it would be
> equivalent to, e.g., tt-Arab, az-Arab. It only really is needed,
> when the actual script is not Arabic, so tt-Latn-ARABIC (not a real,
> or legal, subtag). So tt-Arab and tt-ARABIC are completely
If I understand the problem correctly, you want to distinguish between "tt-alalc97" when transliterated from the Arabic script vs. the Cyrillic script. This suggests to me that you want a subordinate subtag (following alalc97) rather than trying to repurpose some unrelated but already defined subtag value.
For example, you might consider registering a few subtags such as the following:
Subtag: sArab (this would actually be lowercase in the registry)
Description: transliteration from the Arabic script
Prefix: tt-alalc97 (etc.....)
Comments: transliterated document's source script was Arabic; a document tagged
with this subtag will be in the Latin script. Differences in transliteration
occur depending on the source script.
Alternatively, it might be time to consider a transliteration extension to forestall increasingly baroque subtag collections. Extensions allow for any subtag between 2 and 8 characters and can define their own rules for legal usage. For example, if 't' were assigned to an extension for transliteration, it might then define subtags to allow a tag like:
"tt-alalc97-t-arab" // Tatar transliterated from the Latin script
Writing an extension turns out not to be very hard. The main problem would be deciding what to put in it (which might be an intractable problem).
Globalization Architect (Lab126)
Chair (W3C I18N, IETF IRI WGs)
Internationalization is not a feature.
It is an architecture.
More information about the Ietf-languages