Machine Translation

Peter Constable petercon at microsoft.com
Fri Sep 11 17:28:37 CEST 2009


I made my opinion on script clear many year ago: when it first came up in the context of 1766bis I felt we had not yet sufficiently worked out a conceptual framework to decide whether script should be incorporated into a language tag or handled as a separate metadata element, but by the time we were working on 3066bis I had satisfied myself (at least) in that regard. Language tags declare key language-related attributes of content and other language resources for use in common language-technology processes, and script certainly is such a key attribute. 

We have four options for indicating machine translation of content (or more general translation quality):

1) handle as separate metadata, as in XLIFF, promoting its incorporation in more contexts than XLIFF
2) assume it is to be handled as separate metadata, but don't do anything proactively: leave it to others to decide whether to adopt this additional element attribute in their specifications
3) create a language tag extension
4) register a variant subtag

(Of course, there's also the option of doing nothing.) 

If the aim is to come up with a mechanism for indicating many different qualities, then I could understand why someone might suggest an extension, though I think I'd be more inclined to have this handled as a separate metadata element: that level of information about translation quality (or other qualitative assessments of content) is relevant for particular application contexts, not for general content, and as such it makes more sense to let such declarations be handled in ways that are specific to those application contexts, not in a general tag. If there were one set of qualitative attributions that would be useful to a number of different application contexts, then an extension might make more sense, but I suspect that's likely not the case. So, I'm inclined to agree with Felix: qualitative attributes are needed in particular application contexts, and in those context people (likely) have what they need.

Again, if the common-use scenario is simply to allow all MT content to be filtered from input to various kinds of processes, then an extension is overkill; a variant subtag is adequate. Michael suggested that that is an attribution of the authoring process and not a linguistic attribution of the content, but I disagree: it's as much a linguistic attribution as saying that the content was authored by a 90-year-old from the homeland vs. a youth in the diaspora. I do, however, think there's a question as to whether this is a realistic, common need of sufficient value to warrant a generic subtag.


Peter

-----Original Message-----
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Doug Ewell
Sent: Friday, September 11, 2009 6:13 AM
To: ietf-languages at iana.org
Subject: Re: Machine Translation

Felix Sasaki <felix dot sasaki at fh dash potsdam dot de> wrote:

>> The problem with this is that it applies to XLIFF (XML Localization 
>> Interchange File Format) only. A language tag extension, in contrast, 
>> can be used anywhere language tags can already be used.
>
> Yes, but in XLIFF it is a common to store such information in a 
> separate field, that is not as part of a language identifier (which 
> are used as well in XLIFF, via xml:lang). Having a language identifier 
> with similar information would create confusion.

The same argument was made against script subtags during the development of RFC 4646.

> I did not propose the table as an input for variant subtags. I rather 
> think that people who mostly need the use case discussed in this 
> thread (the localization industry) lalready have the means the need - 
> they just do not rely on language tags for the purpose of identifying 
> machine translated content.

In fact, the thread started with someone saying they did not have a means to identify machine-translated content, and were looking to language tags to fill the void.

--
Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s

_______________________________________________
Ietf-languages mailing list
Ietf-languages at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages



More information about the Ietf-languages mailing list