Tagging transliterations from a specific script
Doug Ewell
doug at ewellic.org
Mon Mar 14 18:19:16 CET 2011
I'd like to see if we can address Avram Lyon's request, either accepting
one of the possible solutions or rejecting them all and recommending a
private-use subtag, instead of just leaving it hanging.
As I understand it, Avram's use case is that he has (potentially) two
samples of text:
* Tatar, transliterated from original Arabic into Latin
* Tatar, transliterated from original Cyrillic into Latin
Currently, without using private-use subtags, both of these would have
to be tagged as "tt-alalc97". (This could also be "tt-Latn-alalc97",
but for simplicity, the script subtag will be left out of the examples
which follow.) Avram argues that the differences between these samples
require distinct tagging. I don't see this as an "obvious error" in
registering 'alalc97', but simply a use case which was not envisioned at
the time.
Here are the options as I see them. Comments are welcome; that's what
I'm trying to accomplish.
Option 1: Private-use subtags.
Examples:
tt-alalc97-x-sarab
tt-alalc97-x-fromarab
tt-alalc97-x-from-arab
etc.
Advantages (the usual ones for private-use):
* Can be used immediately, without waiting for registration.
* Can be made as human-readable as desired (e.g.
tt-alalc97-convert-from-arabic).
Disadvantage (also the usual one for private-use):
* Not standardized for use outside local setting.
Option 2: Variant under 'alalc97'.
Examples:
tt-alalc97-sarab
tt-alalc97-fromarab
Advantages:
* Reasonably compact representation.
* Restricts use to suitable languages and ALA-LC scheme.
Disadvantages:
* Each combination of language and transliteration scheme must be
specified as prefix; could grow out of control if more general use
is desired.
* Best to reserve 5-letter variants starting with 's' (already
impossible because of 'solba') or 'from', to avoid collisions.
* Must register each "source script" as separate variant.
Option 3: Generic variant, with 'alalc97' optional.
Examples:
tt-(alalc97-)sarab
tt-(alalc97-)fromarab
Also:
ru-(alalc97-)scyrl
zh-(pinyin-)shans
Advantages:
* Compact representation.
* Could apply to any transliteration or transcription if desired.
* No unwieldy list of prefixes in Registry.
Disadvantages:
* Greater potential for misuse.
* Specifying source script without transliteration variant might
yield little useful information.
* Best to reserve 5-letter variants starting with 's' (already
impossible because of 'solba') or 'from', to avoid collisions.
* Must register each "source script" as separate variant.
Option 4: Extension (suggested by Addison).
Example:
tt-alalc97-t-arab
Advantages:
* Most flexible option.
* No need to reserve a block of variants.
* Offloads this (arguably special) use case from the main Registry
and this list.
Disadvantage:
* Might be far too much overhead if only a few combinations of
(language, source script, target script) are envisioned for use.
--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s
More information about the Ietf-languages
mailing list