Machine Translation

Debbie Garside debbie at
Wed Sep 9 15:25:01 CEST 2009



The following is part of a conversation I have been having with a couple of
colleagues and I was wondering if anyone had any ideas on whether a generic
tag could be registered for machine translated text?  In the past we have
steered away from generic tags (such as western).



However, we are concerned that a lot of MT produced Welsh could appear
on the web. Google's translation into Welsh isn't perfect by a long
shot. Previous so-called attempts have been a lot worse but have been used :

s/benbore/240597433/ (just one of many)

and blogs MT'ed into Welsh outnumber those originally written in Welsh
when searching in Google.

    e.g. HYPERLINK

This would not be great news. We hope with this development that some
can be educated to use such a service responsibly :


However, at the very least, this could frustrate our (and others) work
and efforts e.g. collecting original Welsh texts from the web as a corpus.

An idea we had, if this does not already exists for other languages
(though languages supported by MT to date have been 'larger' and more
robust), was whether ISO 639 could be used in the future to produce
codes (or extensions) for tagging text/language as being from an MT
system. Hopefully the provision of codes or meta data could facilitate
MT providers to implement these so that such texts can be excluded in
certain applications. (including search engines!)


And further….




However, in our case we would welcome a further distinction of MT for Welsh.
Clearly there must be a distinction  made between 'MT' from InterTrans and
original/proper Welsh. 

But we might want to one day want to distinguish even between MT providers -
cy-mt-intertrans, cy-mt-google and cy-mt-apertium. Some might be more
reliable than the others.


Your thoughts would be appreciated.


Best regards




Debbie Garside



Internal Virus Database is out-of-date.
Checked by AVG. 
Version: 7.5.560 / Virus Database: 270.12.26/2116 - Release Date: 15/05/2009
-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Ietf-languages mailing list