petercon at microsoft.com
Wed Sep 9 19:41:13 CEST 2009
A generic tag ("machxlat"?) doesn't seem like a terrible idea. But it's also not clear to me how it would be used: would it only be reported to users in some UI, or would other automated processes be used on tags containing this subtag? Is MT important to distinguish from native speakers with bad (non-conventional) spelling and grammar, or from 2nd-language speakers with bad spelling and grammar, or even from non-standard dialects (which may differ considerably from standard usage)?
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Debbie Garside
Sent: Wednesday, September 09, 2009 6:25 AM
To: 'ietflang IETF Languages Discussion'
Subject: Machine Translation
The following is part of a conversation I have been having with a couple of colleagues and I was wondering if anyone had any ideas on whether a generic tag could be registered for machine translated text? In the past we have steered away from generic tags (such as western).
However, we are concerned that a lot of MT produced Welsh could appear
on the web. Google's translation into Welsh isn't perfect by a long
shot. Previous so-called attempts have been a lot worse but have been used :
http://www.flickr.com/photos/benbore/240597433/ (just one of many)
and blogs MT'ed into Welsh outnumber those originally written in Welsh
when searching in Google.
This would not be great news. We hope with this development that some
can be educated to use such a service responsibly :
However, at the very least, this could frustrate our (and others) work
and efforts e.g. collecting original Welsh texts from the web as a corpus.
An idea we had, if this does not already exists for other languages
(though languages supported by MT to date have been 'larger' and more
robust), was whether ISO 639 could be used in the future to produce
codes (or extensions) for tagging text/language as being from an MT
system. Hopefully the provision of codes or meta data could facilitate
MT providers to implement these so that such texts can be excluded in
certain applications. (including search engines!)
However, in our case we would welcome a further distinction of MT for Welsh. Clearly there must be a distinction made between 'MT' from InterTrans and original/proper Welsh.
But we might want to one day want to distinguish even between MT providers - cy-mt-intertrans, cy-mt-google and cy-mt-apertium. Some might be more reliable than the others.
Your thoughts would be appreciated.
Internal Virus Database is out-of-date.
Checked by AVG.
Version: 7.5.560 / Virus Database: 270.12.26/2116 - Release Date: 15/05/2009 06:16
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ietf-languages