Tagging transliterations from a specific script
Peter Constable
petercon at microsoft.com
Mon Mar 14 19:34:34 CET 2011
There's another option you haven't mentioned: register variant subtags that refer to the specific transliteration schemes currently encompassed by the non-specific subtag alalc97 and deprecate alalc97 or at least recommend that the specific subtags be used in general.
If people really think an extension is needed, I'd like to see a clear business case for creating a new extension: it's not in any way obvious to me that the currently-available mechanism (variant subtags) isn't adequate. Also, if someone is going to pursue that, I think it would be good to do that soon: if there's going to be a different mechanism for handling transliterations, then it would be better not to continue registering variant subtags for transliterations.
Peter
-----Original Message-----
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Doug Ewell
Sent: Monday, March 14, 2011 10:19 AM
To: ietf-languages at iana.org
Subject: Re: Tagging transliterations from a specific script
I'd like to see if we can address Avram Lyon's request, either accepting one of the possible solutions or rejecting them all and recommending a private-use subtag, instead of just leaving it hanging.
As I understand it, Avram's use case is that he has (potentially) two samples of text:
* Tatar, transliterated from original Arabic into Latin
* Tatar, transliterated from original Cyrillic into Latin
Currently, without using private-use subtags, both of these would have to be tagged as "tt-alalc97". (This could also be "tt-Latn-alalc97", but for simplicity, the script subtag will be left out of the examples which follow.) Avram argues that the differences between these samples require distinct tagging. I don't see this as an "obvious error" in registering 'alalc97', but simply a use case which was not envisioned at the time.
Here are the options as I see them. Comments are welcome; that's what I'm trying to accomplish.
Option 1: Private-use subtags.
Examples:
tt-alalc97-x-sarab
tt-alalc97-x-fromarab
tt-alalc97-x-from-arab
etc.
Advantages (the usual ones for private-use):
* Can be used immediately, without waiting for registration.
* Can be made as human-readable as desired (e.g.
tt-alalc97-convert-from-arabic).
Disadvantage (also the usual one for private-use):
* Not standardized for use outside local setting.
Option 2: Variant under 'alalc97'.
Examples:
tt-alalc97-sarab
tt-alalc97-fromarab
Advantages:
* Reasonably compact representation.
* Restricts use to suitable languages and ALA-LC scheme.
Disadvantages:
* Each combination of language and transliteration scheme must be
specified as prefix; could grow out of control if more general use
is desired.
* Best to reserve 5-letter variants starting with 's' (already
impossible because of 'solba') or 'from', to avoid collisions.
* Must register each "source script" as separate variant.
Option 3: Generic variant, with 'alalc97' optional.
Examples:
tt-(alalc97-)sarab
tt-(alalc97-)fromarab
Also:
ru-(alalc97-)scyrl
zh-(pinyin-)shans
Advantages:
* Compact representation.
* Could apply to any transliteration or transcription if desired.
* No unwieldy list of prefixes in Registry.
Disadvantages:
* Greater potential for misuse.
* Specifying source script without transliteration variant might
yield little useful information.
* Best to reserve 5-letter variants starting with 's' (already
impossible because of 'solba') or 'from', to avoid collisions.
* Must register each "source script" as separate variant.
Option 4: Extension (suggested by Addison).
Example:
tt-alalc97-t-arab
Advantages:
* Most flexible option.
* No need to reserve a block of variants.
* Offloads this (arguably special) use case from the main Registry
and this list.
Disadvantage:
* Might be far too much overhead if only a few combinations of
(language, source script, target script) are envisioned for use.
--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s
_______________________________________________
Ietf-languages mailing list
Ietf-languages at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
More information about the Ietf-languages
mailing list