Tagging transliterations from a specific script

Peter Constable petercon at microsoft.com
Mon Mar 14 19:34:34 CET 2011


There's another option you haven't mentioned: register variant subtags that refer to the specific transliteration schemes currently encompassed by the non-specific subtag alalc97 and deprecate alalc97 or at least recommend that the specific subtags be used in general.

If people really think an extension is needed, I'd like to see a clear business case for creating a new extension: it's not in any way obvious to me that the currently-available mechanism (variant subtags) isn't adequate. Also, if someone is going to pursue that, I think it would be good to do that soon: if there's going to be a different mechanism for handling transliterations, then it would be better not to continue registering variant subtags for transliterations.


Peter

-----Original Message-----
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Doug Ewell
Sent: Monday, March 14, 2011 10:19 AM
To: ietf-languages at iana.org
Subject: Re: Tagging transliterations from a specific script

I'd like to see if we can address Avram Lyon's request, either accepting one of the possible solutions or rejecting them all and recommending a private-use subtag, instead of just leaving it hanging.

As I understand it, Avram's use case is that he has (potentially) two samples of text:

* Tatar, transliterated from original Arabic into Latin
* Tatar, transliterated from original Cyrillic into Latin

Currently, without using private-use subtags, both of these would have to be tagged as "tt-alalc97".  (This could also be "tt-Latn-alalc97", but for simplicity, the script subtag will be left out of the examples which follow.)  Avram argues that the differences between these samples require distinct tagging.  I don't see this as an "obvious error" in registering 'alalc97', but simply a use case which was not envisioned at the time.

Here are the options as I see them.  Comments are welcome; that's what I'm trying to accomplish.

Option 1:  Private-use subtags.

Examples:
	tt-alalc97-x-sarab
	tt-alalc97-x-fromarab
	tt-alalc97-x-from-arab
	etc.

Advantages (the usual ones for private-use):
	* Can be used immediately, without waiting for registration.
	* Can be made as human-readable as desired (e.g.
	  tt-alalc97-convert-from-arabic).

Disadvantage (also the usual one for private-use):
	* Not standardized for use outside local setting.

Option 2:  Variant under 'alalc97'.

Examples:
	tt-alalc97-sarab
	tt-alalc97-fromarab

Advantages:
	* Reasonably compact representation.
	* Restricts use to suitable languages and ALA-LC scheme.

Disadvantages:
	* Each combination of language and transliteration scheme must be
	  specified as prefix; could grow out of control if more general use
	  is desired.
	* Best to reserve 5-letter variants starting with 's' (already
	  impossible because of 'solba') or 'from', to avoid collisions.
	* Must register each "source script" as separate variant.

Option 3:  Generic variant, with 'alalc97' optional.

Examples:
	tt-(alalc97-)sarab
	tt-(alalc97-)fromarab
Also:
	ru-(alalc97-)scyrl
	zh-(pinyin-)shans

Advantages:
	* Compact representation.
	* Could apply to any transliteration or transcription if desired.
	* No unwieldy list of prefixes in Registry.

Disadvantages:
	* Greater potential for misuse.
	* Specifying source script without transliteration variant might
	  yield little useful information.
	* Best to reserve 5-letter variants starting with 's' (already
	  impossible because of 'solba') or 'from', to avoid collisions.
	* Must register each "source script" as separate variant.

Option 4:  Extension (suggested by Addison).

Example:
	tt-alalc97-t-arab

Advantages:
	* Most flexible option.
	* No need to reserve a block of variants.
	* Offloads this (arguably special) use case from the main Registry
	  and this list.

Disadvantage:
	* Might be far too much overhead if only a few combinations of
	  (language, source script, target script) are envisioned for use.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­


_______________________________________________
Ietf-languages mailing list
Ietf-languages at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages


More information about the Ietf-languages mailing list