Machine Translation

CE Whitehead cewcathar at
Sat Sep 12 23:09:52 CEST 2009

If an extension is developed to indicate a machine translation, there should be a way to indicate the date the translation was done.


That would be my main concern.


An alternative is to develop an internationalization document describing best practices for handling/labeling translations.


(I hope what I am saying makes sense.)
Doug Ewell doug at 
Fri Sep 11 15:13:23 CEST 2009 
>> I did not propose the table as an input for variant subtags. I rather 
>> think that people who mostly need the use case discussed in this 
>> thread (the localization industry) lalready have the means the need - 
>> they just do not rely on language tags for the purpose of identifying 
>> machine translated content.

> In fact, the thread started with someone saying 
> they did not have a 
> means to identify machine-translated content, 
> and were looking to 
> language tags to fill the void.
Personally, I'd like to be told when I retrieved a document whether it was a human translation or machine translation or original, and if either of the former, where the original was lodged--if it is lodged online, and particularly so before I put the document through an online translator.  (For example, if I retrieved a document translated from French to English with a machine, and I were a native speaker of French, and I put it back through a machine to get a French version, that would be ----- dumb, I guess.)

The w3c's policy for translations--requiring at the top or bottom of a document (in a header or footer that 'wraps' the document) a statement indicating that a document is a translation, the the original document is the normative version of the document, and with the URL of the original document--does result in this information's appearing in search results (this is helpful!).  I've not been able to locate much about translation policy otherwise online but note that joomla translations does have a translation policy:
Also in headers recommended by joomla, the year the translation was done is indicated; the year is relevant in any system for tagging/identifying machine translations; see: 
Online translators have improved over time (although they still have a ways to go in identifying referents, handling ellipsis, etc., but this does now seem to be a focus of linguistic research; prepositions are another issue; the use of these varies greatly from language to language).  Thus the year a machine translation is done is relevant (it's also relevant in a non-machine translation as it may help to identify the particularly variety of a language being used--as Mark noted in his discussion of California speech habits--though I must say I lived in L.A. for a year and I always said 'I-5' and 'I-10' and I never heard anything to correct me).

Here's an example from an online translator:
Original English:
"I've read the story; I'm not sure what it's about."
Online translator I produced a relatively literal translation (there's pronoun error--and of course if the 'I' is a she there's a second error):
"J'ai lu l'histoire, je ne suis pas sûr de quoi il s'agit."
(The second clause above translates back into English at this translator as: "I'm not sure what it is.") 

Online translator II produced an awkward translation that was not idiomatic as well as the pronoun error above plus a very bizarre translation back into the original--I decided not to report the various results I got here in this thread--I had time to only get few anyway).

C. E. Whitehead
cewcathar at 

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Ietf-languages mailing list