Tagging transliterations from a specific script

Peter Constable petercon at microsoft.com
Sun Mar 6 08:49:40 CET 2011

It seems to me that there was an obvious error here which was to register a single variant subtag for multiple writing conventions. A convention for transliterating Cyrllic script into Latin and a convention for transliterating Arabic into Latin may both use Latin script and may be specified in the same document; but that doesn't mean they are the same.

I'm not convinced we need an extension for transliterations.


-----Original Message-----
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Phillips, Addison
Sent: Monday, February 14, 2011 8:22 AM
To: Avram Lyon; ietf-languages
Subject: RE: Tagging transliterations from a specific script

> I'd suggest a tag for the Turkic languages affected by the 
> introduction of Janalif, before the introduction of the same, but I 
> don't want to cause the same justifiable concern that was raised about 
> my proposed "pre1917" tag on this list last fall. Also, such a tag 
> would really just represent a script, so in most cases it would be 
> equivalent to, e.g., tt-Arab, az-Arab. It only really is needed, then, 
> when the actual script is not Arabic, so tt-Latn-ARABIC (not a real, 
> or legal, subtag). So tt-Arab and tt-ARABIC are completely identical.

If I understand the problem correctly, you want to distinguish between "tt-alalc97" when transliterated from the Arabic script vs. the Cyrillic script. This suggests to me that you want a subordinate subtag (following alalc97) rather than trying to repurpose some unrelated but already defined subtag value. 

For example, you might consider registering a few subtags such as the following:

      Type:             variant
      Subtag:           sArab      (this would actually be lowercase in the registry)
      Description:      transliteration from the Arabic script
      Prefix:           tt-alalc97 (etc.....)
      Comments:         transliterated document's source script was Arabic; a document tagged
          with this subtag will be in the Latin script. Differences in transliteration
          occur depending on the source script.

Alternatively, it might be time to consider a transliteration extension to forestall increasingly baroque subtag collections. Extensions allow for any subtag between 2 and 8 characters and can define their own rules for legal usage. For example, if 't' were assigned to an extension for transliteration, it might then define subtags to allow a tag like:

  "tt-alalc97-t-arab" // Tatar transliterated from the Latin script

Writing an extension turns out not to be very hard. The main problem would be deciding what to put in it (which might be an intractable problem).


Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N, IETF IRI WGs)

Internationalization is not a feature.
It is an architecture.

Ietf-languages mailing list
Ietf-languages at alvestrand.no

More information about the Ietf-languages mailing list