New extension for transformed languages

CE Whitehead cewcathar at
Tue Mar 6 01:12:42 CET 2012


Doug Ewell doug at
Mon Mar 5 03:05:10 CET 2012

> Michael Everson <everson at evertype dot com> wrote:

>> No! No! No! No! No! No! No!
>> Transliteration is one thing, which can be managed by an algorithm.
>> Translation is something else entirely. There is NO TAG WHATSOVER that
>> can tell you why someone might choose to use "entirely" as oppose to
>> "wholely" for translating German "ganz".
>> "t" must only be for script transformation, in my view.

> Much of RFC 6497 would seem to indicate otherwise, including this, in 
> the Introduction:

> "This document defines an extension for specifying the source of content 
> that has been transformed, including text that has been transliterated, 
> transcribed, or translated, or in some other way influenced by the
> source.  It may be used in queries to request content that has been 
> transformed."

And the following (from the RFC) suggests that the -t extension is also for altered pronunciations (quite a can of worms, but o.k. by me because I think it's doable; much but not all of accents are systematic; but things like an overbite's effect on an accent can't always be guessed in advance; and perhaps some speakers have peculiar influences on their accent other than that of their native language -- for example myself, even in the U.S. as a kid no one could figure out where I came from, so they tried to put me in speech classes in grade school but could not figure out which dialect I spoke):

"For example, this extension could be
   used to designate a request for a speech recognizer that is tailored

   specifically for second-language speakers who are first-language
   speakers of a particular language (e.g., a recognizer for "English
   spoken with a Chinese accent")."

> Of course, some translation decisions have more to do with the 
> individual translator than with either the source or target language, 
> and of course there is no way to tag that thought process, nor should 
> there be.
Then I suppose Doug does not feel it would not be wise to register a mechanism for a translation (I am unsure myself as to whether it would be wise, though, because, for example, some people translating poetry ignore the rhythm and carefully translate the words and then make the word order come out in the new language; other translations are just word-by-word without being rearranged; and then some translations take a great deal of liberty, perhaps trying to convey sentence rhythms, or the author's "voice").

Otherwise I have to agree with Doug: I don't think you tag the thought 
process but you can indicate the from- and to-language at least.

 yes, there are many decisions made in terms of what tone in the target 
language best captures the author's tone in the original language, and 

(It might have been nice though if the m0 subtag could have first 
specified the general, whether the mechanism was for a transcription, a 
transliteration, or a translation; I'm not sure what  [ungegn] is 
however; the various country names are translations from one language to
 another while the various city names are transliterations I think. So 
someone feel free to explain this to me.)

> --
> Doug Ewell | Thornton, Colorado, USA | @DougEwell ­ 

* * *

Michael Everson everson at
Mon Mar 5 19:23:25 CET 2012

> On 5 Mar 2012, at 18:11, Doug Ewell wrote:

>> To me, a tag like "ru-t-it" does mean "translated from Italian into Russian."

> To me it does not.
To me it means a translation because we have two different languages here; for this tag to indicate a transliteration I would expect the language to remain the same in both instances and the script change; thus the subtag:


I assume would indicate a transcription into Cyrillic script of Dante with no particular phonetic alphabet variant mentioned.  Is this correct?

(I'm not sure here if there would in this case be any reason to insert the script [Latn] after Italian?
But at least it would be necessary to specify [Cyrl] for the script the Italian had been transcribed into, since [Cyrl] is not the suppress-script for Italian.

In any case the RFC suggests that for a transliteration und- would be used for both languages:
 "Where only the script is relevant (such as identifying a script-
   script transliteration), then 'und' is used for the primary language

Thus the correct subtag for a transliteration not a transcription in this case would apparently be und-Cyrl-t-und-Latn -- if I understand the draft correctly,
that [und] is used for the language in transliterations because it's irrelevant to the letter-by-letter approach:

However, for a phonetic transcription I suppose the variant subtags [fonipa] or [fonupa] would indicate the most likely standard used in a transcription
so I don't know if it's a transcription or not.)

Hope I've got this right.


--C. E. Whitehead
cewcathar at 
>> If the sense of transcription or transliteration is intended, I would expect one or both
>> script subtags to be included, instead of making the tag consumer infer "Cyrillic" from
>> "Russian" and "Latin" from "Italian." But maybe that's just me.

> I wouldn't, because Russian and Italian ought to have script-suppress for Cyrl and Latn
> respectively.

Michael Everson *

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Ietf-languages mailing list