Machine Translation

Felix Sasaki felix.sasaki at
Sat Sep 12 09:06:28 CEST 2009

Of course one could define several machine translation related extensions,
but at some point the suitability of language tags is really in question.
E.g. how would you represent a bleu score as an extension? The point is that
proper metadata for evaluation of machine translation soon goes beyond a
simple "can be represented as a closed set of strings" pattern.


2009/9/11 Kent Karlsson <kent.karlsson14 at>

>  What you say is correct for a (single) variant subtag, as initially
> suggested, but extension subtags
> work differently. See
> Data like that you refer
> to can be put in the part that follows the extention "singleton".
> Note also that section 2.2.6 starts:
> "Extensions provide a mechanism for extending language tags for use in
>    various applications.  They are intended to identify information that
>    is commonly used in association with languages or language tags but
>    that is not part of language identification.
> "
>         /kent k
> Den 2009-09-11 18.35, skrev "Felix Sasaki" <felix.sasaki at>:
> I would agree with Yves Savourel that for translation tool developers, this
> kind of information is better provided via a different field. Other
> practical information which one could not pack into a broad data category
> "machine translation" easily (to use Peter's terminology), but not easily in
> the "language tag" field would be: name of system that generated the
> translation (maybe several ones where used ...), quality of the input,
> quality rating of the system (e.g. BLEU score). IMO these fine grained
> differences are necessary for making use of this kind of metadata, and I
> don't see a clear use case for a broad "machine translated" sub tag.
> Felix
> 2009/9/11 Kent Karlsson <kent.karlsson14 at>
> Den 2009-09-11 17.32, skrev "Peter Constable" <petercon at>:
> > From: ietf-languages-bounces at
> > [mailto:ietf-languages-bounces at<ietf-languages-bounces at>]
> On Behalf Of Felix Sasaki
> >
> >> There is a difference in the case of XLIFF. If the extension subtag is
> just
> >> similar,
> >> but not identical to MT related information in other technologies like
> >> you
> >> will end up with a mess of *values*. This is IMO different from the
> script
> >> subtag
> >> case: Here you have the same values, but different *occurences*
> >
> > Expressed with different terminology: you end up with a mess of data
> > categories; in the script subtag case, you have a single data category
> with
> > many values.
> I don't think that should be a major issue. XLIFF, and other formats having
> separate attributes for this, could simply have that attribute take
> priority, even to the extent that "language extensions", in particular one
> that overlaps with an attribute, can be completely ignored in those
> formats.
>         /kent k
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at
-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Ietf-languages mailing list