Language tags and (localization) processes (Re: [Ltru] draft-davis-t-langtag-ext)

Thu Jul 14 05:45:38 CEST 2011

With the merger of translation and transliteration and other aspects of language state under transformation it becomes more of a concern to me that by limiting the language tag of the source of the transform to exclude singleton tags one is not able to mark successive transforms. For example, within Pali canonical literature there are features called Magadhisms which reflect an underlying Eastern dialect of Middle Indo-Aryan – and presumably survive as echoes of an earlier state in the transmission of the literature. When these Pali texts are rendered in Latin script (a common practice) there are two separate transforms, a historical language transform and a modern script transform. One could imagine rendering this as follows:

Language Tag
pi-Latn-t-pi-Sinh-t-pra

Description
The content is in Pali, transliterated into Latin script from Sinhala script, and transformed from an underlying Prakrit language.

Do I understand the intent of the proposal correctly such that this tag would be invalid because a singleton t follows the language tag already introduced with t? The alternative might be to merge both transforms into a single transform:
pi-Latn-t-pra-Sinh

But that would be incorrect as it implies the source was Prakrit language written in Sinhala script, but that source never existed in that form. Or perhaps this is deliberate to avoid endless recursion in tagging which might lead to a single language tag encapsulating the complete linguistic history of an item . . .

Andrew

From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of CE Whitehead
Sent: Tuesday, July 12, 2011 11:35 AM
To: ltru at ietf.org; ietf-languages at iana.org
Subject: Language tags and (localization) processes (Re: [Ltru] draft-davis-t-langtag-ext)

Hi, once more.
Felix Sasaki felix.sasaki at fh-potsdam.de
Tue Jul 12 09:23:36 CEST 2011

> Language tags so far have described *states*: an object is in a language, a
> script etc. The proposed extension extends languages to described *states*: an object is in a language, a script etc.
> The proposed extension extends languages to describe the outcome
> of a *process*: objects have been transformed, with a source object as the
> basis for this process. According to the paragraph above, this
> transformation includes also translation.

I do personally agree that it's good to discuss and then document in the draft some of the concerns you have described.
And yes, translation/transliteration is a process.

That said, I personally see the from language as part of the "state" too of the translation --
it affects how things end up being translated; sometimes a translation even includes annotation of say a "pun" in the original.

The translator may translate structures, terms, greetings directly from the original language (for example translating from Arabic to English, do you translate "tahaya-t-an wa 'ihtaraama wa ba'ada" -- hope I've got my transliteration from the Arabic right here -- at the beginning of a letter?; if you opt to, you might begin your translation "Greetings and Respect, and now;" some translators may shorten this; in any case, many speakers of one language, when they create texts in another language, well elements of their first language tend to slip into it; also some of the grammar from the original language may be translated into the new language).

I think this is even more true for transliteration; knowing the variety transliterated into Latin script is extremely important.

So it's useful to know the language of the original that the translation was made from, in my opinion; this gives you more details about the state.

I do think this is briefly mentioned (intro, last paragraph):

   "The usage of this extension is not limited to formal transformations,

   and may include other instances where the content is in some other

   way influenced by the source.  For example, this extension could be

   used to designate a request for a speech recognizer that is tailored

   specifically for 2nd-language speakers who are 1st-language speakers

   of a particular language (e.g. a recognizer for "English spoken with

   a Chinese accent")."

Maybe there could be very brief info (in the intro or where the M0 part of the extension is discussed) on the methods/mechanism used in transcription, why they are relevant to indicate, a sentence or something?

Best,

--C. E. Whitehead
cewcathar at hotmail.com<mailto:cewcathar at hotmail.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20110714/1d1ff596/attachment-0001.html>