Tagging transliterations from a specific script

Doug Ewell doug at ewellic.org
Mon Mar 14 18:19:16 CET 2011


I'd like to see if we can address Avram Lyon's request, either accepting
one of the possible solutions or rejecting them all and recommending a
private-use subtag, instead of just leaving it hanging.

As I understand it, Avram's use case is that he has (potentially) two
samples of text:

* Tatar, transliterated from original Arabic into Latin
* Tatar, transliterated from original Cyrillic into Latin

Currently, without using private-use subtags, both of these would have
to be tagged as "tt-alalc97".  (This could also be "tt-Latn-alalc97",
but for simplicity, the script subtag will be left out of the examples
which follow.)  Avram argues that the differences between these samples
require distinct tagging.  I don't see this as an "obvious error" in
registering 'alalc97', but simply a use case which was not envisioned at
the time.

Here are the options as I see them.  Comments are welcome; that's what
I'm trying to accomplish.

Option 1:  Private-use subtags.

Examples:
	tt-alalc97-x-sarab
	tt-alalc97-x-fromarab
	tt-alalc97-x-from-arab
	etc.

Advantages (the usual ones for private-use):
	* Can be used immediately, without waiting for registration.
	* Can be made as human-readable as desired (e.g.
	  tt-alalc97-convert-from-arabic).

Disadvantage (also the usual one for private-use):
	* Not standardized for use outside local setting.

Option 2:  Variant under 'alalc97'.

Examples:
	tt-alalc97-sarab
	tt-alalc97-fromarab

Advantages:
	* Reasonably compact representation.
	* Restricts use to suitable languages and ALA-LC scheme.

Disadvantages:
	* Each combination of language and transliteration scheme must be
	  specified as prefix; could grow out of control if more general use
	  is desired.
	* Best to reserve 5-letter variants starting with 's' (already
	  impossible because of 'solba') or 'from', to avoid collisions.
	* Must register each "source script" as separate variant.

Option 3:  Generic variant, with 'alalc97' optional.

Examples:
	tt-(alalc97-)sarab
	tt-(alalc97-)fromarab
Also:
	ru-(alalc97-)scyrl
	zh-(pinyin-)shans

Advantages:
	* Compact representation.
	* Could apply to any transliteration or transcription if desired.
	* No unwieldy list of prefixes in Registry.

Disadvantages:
	* Greater potential for misuse.
	* Specifying source script without transliteration variant might
	  yield little useful information.
	* Best to reserve 5-letter variants starting with 's' (already
	  impossible because of 'solba') or 'from', to avoid collisions.
	* Must register each "source script" as separate variant.

Option 4:  Extension (suggested by Addison).

Example:
	tt-alalc97-t-arab

Advantages:
	* Most flexible option.
	* No need to reserve a block of variants.
	* Offloads this (arguably special) use case from the main Registry
	  and this list.

Disadvantage:
	* Might be far too much overhead if only a few combinations of
	  (language, source script, target script) are envisioned for use.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­




More information about the Ietf-languages mailing list