Machine Translation

Sat Sep 12 09:06:28 CEST 2009

Of course one could define several machine translation related extensions,
but at some point the suitability of language tags is really in question.
E.g. how would you represent a bleu score as an extension? The point is that
proper metadata for evaluation of machine translation soon goes beyond a
simple "can be represented as a closed set of strings" pattern.

Felix

2009/9/11 Kent Karlsson <kent.karlsson14 at comhem.se>

>  What you say is correct for a (single) variant subtag, as initially
> suggested, but extension subtags
> work differently. See http://tools.ietf.org/search/rfc5646#section-2.2.6.
> Data like that you refer
> to can be put in the part that follows the extention "singleton".
>
> Note also that section 2.2.6 starts:
> "Extensions provide a mechanism for extending language tags for use in
>    various applications.  They are intended to identify information that
>    is commonly used in association with languages or language tags but
>    that is not part of language identification.
> "
>
>         /kent k
>
>
>
> Den 2009-09-11 18.35, skrev "Felix Sasaki" <felix.sasaki at fh-potsdam.de>:
>
> I would agree with Yves Savourel that for translation tool developers, this
> kind of information is better provided via a different field. Other
> practical information which one could not pack into a broad data category
> "machine translation" easily (to use Peter's terminology), but not easily in
> the "language tag" field would be: name of system that generated the
> translation (maybe several ones where used ...), quality of the input,
> quality rating of the system (e.g. BLEU score). IMO these fine grained
> differences are necessary for making use of this kind of metadata, and I
> don't see a clear use case for a broad "machine translated" sub tag.
>
> Felix
>
> 2009/9/11 Kent Karlsson <kent.karlsson14 at comhem.se>
>
>
> Den 2009-09-11 17.32, skrev "Peter Constable" <petercon at microsoft.com>:
>
> > From: ietf-languages-bounces at alvestrand.no
> > [mailto:ietf-languages-bounces at alvestrand.no<ietf-languages-bounces at alvestrand.no>]
> On Behalf Of Felix Sasaki
> >
> >> There is a difference in the case of XLIFF. If the extension subtag is
> just
> >> similar,
> >> but not identical to MT related information in other technologies like
> XLIFF,
> >> you
> >> will end up with a mess of *values*. This is IMO different from the
> script
> >> subtag
> >> case: Here you have the same values, but different *occurences*
> >
> > Expressed with different terminology: you end up with a mess of data
> > categories; in the script subtag case, you have a single data category
> with
> > many values.
>
> I don't think that should be a major issue. XLIFF, and other formats having
> separate attributes for this, could simply have that attribute take
> priority, even to the extent that "language extensions", in particular one
> that overlaps with an attribute, can be completely ignored in those
> formats.
>
>         /kent k
>
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20090912/a4075370/attachment.htm